from:"Hyukjin Kwon \(JIRA\)"

[jira] [Resolved] (SPARK-48523) Add `grpc_max_message_size ` description to `client-connection-string.md`

2024-06-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48523.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46862
[https://github.com/apache/spark/pull/46862]

> Add `grpc_max_message_size ` description to `client-connection-string.md`
> -
>
> Key: SPARK-48523
> URL: https://issues.apache.org/jira/browse/SPARK-48523
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48523) Add `grpc_max_message_size ` description to `client-connection-string.md`

2024-06-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48523:


Assignee: BingKun Pan

> Add `grpc_max_message_size ` description to `client-connection-string.md`
> -
>
> Key: SPARK-48523
> URL: https://issues.apache.org/jira/browse/SPARK-48523
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48485) Support interruptTag and interruptAll in streaming queries

2024-06-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48485.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46819
[https://github.com/apache/spark/pull/46819]

> Support interruptTag and interruptAll in streaming queries
> --
>
> Key: SPARK-48485
> URL: https://issues.apache.org/jira/browse/SPARK-48485
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Spark Connect's interrupt API does not interrupt streaming queries. We should 
> support them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48482) dropDuplicates and dropDuplicatesWithinWatermark should accept varargs

2024-06-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48482:


Assignee: Wei Liu

> dropDuplicates and dropDuplicatesWithinWatermark should accept varargs
> --
>
> Key: SPARK-48482
> URL: https://issues.apache.org/jira/browse/SPARK-48482
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48482) dropDuplicates and dropDuplicatesWithinWatermark should accept varargs

2024-06-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48482.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46817
[https://github.com/apache/spark/pull/46817]

> dropDuplicates and dropDuplicatesWithinWatermark should accept varargs
> --
>
> Key: SPARK-48482
> URL: https://issues.apache.org/jira/browse/SPARK-48482
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48508) Client Side RPC optimization for Spark Connect

2024-06-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48508:


Assignee: Ruifeng Zheng

> Client Side RPC optimization for Spark Connect
> --
>
> Key: SPARK-48508
> URL: https://issues.apache.org/jira/browse/SPARK-48508
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48508) Client Side RPC optimization for Spark Connect

2024-06-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48508.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46848
[https://github.com/apache/spark/pull/46848]

> Client Side RPC optimization for Spark Connect
> --
>
> Key: SPARK-48508
> URL: https://issues.apache.org/jira/browse/SPARK-48508
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48507) Use Hadoop 3.3.6 winutils in `build_sparkr_window`

2024-06-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48507.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46846
[https://github.com/apache/spark/pull/46846]

> Use Hadoop 3.3.6 winutils in `build_sparkr_window`
> --
>
> Key: SPARK-48507
> URL: https://issues.apache.org/jira/browse/SPARK-48507
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48507) Use Hadoop 3.3.6 winutils in `build_sparkr_window`

2024-06-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48507:


Assignee: BingKun Pan

> Use Hadoop 3.3.6 winutils in `build_sparkr_window`
> --
>
> Key: SPARK-48507
> URL: https://issues.apache.org/jira/browse/SPARK-48507
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48504) Parent Window class for Spark Connect and Spark Classic

2024-06-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48504:


Assignee: Ruifeng Zheng

> Parent Window class for Spark Connect and Spark Classic
> ---
>
> Key: SPARK-48504
> URL: https://issues.apache.org/jira/browse/SPARK-48504
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48504) Parent Window class for Spark Connect and Spark Classic

2024-06-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48504.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46841
[https://github.com/apache/spark/pull/46841]

> Parent Window class for Spark Connect and Spark Classic
> ---
>
> Key: SPARK-48504
> URL: https://issues.apache.org/jira/browse/SPARK-48504
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48496) Use static regex Pattern instances in common/utils JavaUtils

2024-06-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48496.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/46829

> Use static regex Pattern instances in common/utils JavaUtils
> 
>
> Key: SPARK-48496
> URL: https://issues.apache.org/jira/browse/SPARK-48496
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Some methods in JavaUtils.java are recompiling regexes on every invocation; 
> we should instead store a single cached Pattern.
> This is a minor perf. issue that I spotted in the context of other profiling. 
> Not a huge bottleneck in the grand scheme of things, but simple and 
> straightforward to fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48489) Throw an user-facing error when reading invalid schema from text DataSource

2024-06-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48489.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46823
[https://github.com/apache/spark/pull/46823]

> Throw an user-facing error when reading invalid schema from text DataSource
> ---
>
> Key: SPARK-48489
> URL: https://issues.apache.org/jira/browse/SPARK-48489
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Stefan Bukorovic
>Assignee: Stefan Bukorovic
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Text DataSource produces table schema with only 1 column, but it is possible 
> to try and create a table with schema having multiple columns.
> Currently, when user tries this, we have an assert in the code, which fails 
> and throws internal spark error. We should throw a better user-facing error.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48489) Throw an user-facing error when reading invalid schema from text DataSource

2024-06-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48489:


Assignee: Stefan Bukorovic

> Throw an user-facing error when reading invalid schema from text DataSource
> ---
>
> Key: SPARK-48489
> URL: https://issues.apache.org/jira/browse/SPARK-48489
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Stefan Bukorovic
>Assignee: Stefan Bukorovic
>Priority: Minor
>  Labels: pull-request-available
>
> Text DataSource produces table schema with only 1 column, but it is possible 
> to try and create a table with schema having multiple columns.
> Currently, when user tries this, we have an assert in the code, which fails 
> and throws internal spark error. We should throw a better user-facing error.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48374) Support additional PyArrow Table column types

2024-06-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48374:


Assignee: Ian Cook

> Support additional PyArrow Table column types
> -
>
> Key: SPARK-48374
> URL: https://issues.apache.org/jira/browse/SPARK-48374
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
>
> SPARK-48220 adds support for passing a PyArrow Table to 
> {{{}createDataFrame(){}}}, but there are a few PyArrow column types that are 
> not yet supported:
>  * fixed-size binary
>  * fixed-size list
>  * large list
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48374) Support additional PyArrow Table column types

2024-06-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48374.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46688
[https://github.com/apache/spark/pull/46688]

> Support additional PyArrow Table column types
> -
>
> Key: SPARK-48374
> URL: https://issues.apache.org/jira/browse/SPARK-48374
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SPARK-48220 adds support for passing a PyArrow Table to 
> {{{}createDataFrame(){}}}, but there are a few PyArrow column types that are 
> not yet supported:
>  * fixed-size binary
>  * fixed-size list
>  * large list
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48220) Allow passing PyArrow Table to createDataFrame()

2024-06-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48220:


Assignee: Ian Cook

> Allow passing PyArrow Table to createDataFrame()
> 
>
> Key: SPARK-48220
> URL: https://issues.apache.org/jira/browse/SPARK-48220
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Input/Output, PySpark, SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
>
> SPARK-47365 added support for returning a Spark DataFrame as a PyArrow Table.
> It would be nice if we could also go in the opposite direction, enabling 
> users to create a Spark DataFrame from a PyArrow Table by passing the PyArrow 
> Table to {{spark.createDataFrame()}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48220) Allow passing PyArrow Table to createDataFrame()

2024-06-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48220.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46529
[https://github.com/apache/spark/pull/46529]

> Allow passing PyArrow Table to createDataFrame()
> 
>
> Key: SPARK-48220
> URL: https://issues.apache.org/jira/browse/SPARK-48220
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Input/Output, PySpark, SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SPARK-47365 added support for returning a Spark DataFrame as a PyArrow Table.
> It would be nice if we could also go in the opposite direction, enabling 
> users to create a Spark DataFrame from a PyArrow Table by passing the PyArrow 
> Table to {{spark.createDataFrame()}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48485) Support interruptTag and interruptAll in streaming queries

2024-05-31 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48485:


 Summary: Support interruptTag and interruptAll in streaming queries
 Key: SPARK-48485
 URL: https://issues.apache.org/jira/browse/SPARK-48485
 Project: Spark
  Issue Type: Improvement
  Components: Connect, Structured Streaming
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Spark Connect's interrupt API does not interrupt streaming queries. We should 
support them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48474:


Assignee: BingKun Pan

> Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
> ---
>
> Key: SPARK-48474
> URL: https://issues.apache.org/jira/browse/SPARK-48474
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48474.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46808
[https://github.com/apache/spark/pull/46808]

> Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
> ---
>
> Key: SPARK-48474
> URL: https://issues.apache.org/jira/browse/SPARK-48474
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48467) Upgrade Maven to 3.9.7

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48467:


Assignee: BingKun Pan

> Upgrade Maven to 3.9.7
> --
>
> Key: SPARK-48467
> URL: https://issues.apache.org/jira/browse/SPARK-48467
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48467) Upgrade Maven to 3.9.7

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48467.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46798
[https://github.com/apache/spark/pull/46798]

> Upgrade Maven to 3.9.7
> --
>
> Key: SPARK-48467
> URL: https://issues.apache.org/jira/browse/SPARK-48467
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47716) SQLQueryTestSuite flaky case due to view name conflict

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47716:


Assignee: Jack Chen

> SQLQueryTestSuite flaky case due to view name conflict
> --
>
> Key: SPARK-47716
> URL: https://issues.apache.org/jira/browse/SPARK-47716
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jack Chen
>Assignee: Jack Chen
>Priority: Major
>  Labels: pull-request-available
>
> In SQLQueryTestSuite, the test case "Test logic for determining whether a 
> query is semantically sorted" can sometimes fail with an error
> {{Cannot create table or view `main`.`default`.`t1` because it already 
> exists.}}
> if run concurrently with other sql test cases that also create tables with 
> the same name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47716) SQLQueryTestSuite flaky case due to view name conflict

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47716.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45855
[https://github.com/apache/spark/pull/45855]

> SQLQueryTestSuite flaky case due to view name conflict
> --
>
> Key: SPARK-47716
> URL: https://issues.apache.org/jira/browse/SPARK-47716
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jack Chen
>Assignee: Jack Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> In SQLQueryTestSuite, the test case "Test logic for determining whether a 
> query is semantically sorted" can sometimes fail with an error
> {{Cannot create table or view `main`.`default`.`t1` because it already 
> exists.}}
> if run concurrently with other sql test cases that also create tables with 
> the same name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48461) Replace NullPointerExceptions with proper error classes in AssertNotNull expression

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48461.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46793
[https://github.com/apache/spark/pull/46793]

> Replace NullPointerExceptions with proper error classes in AssertNotNull 
> expression
> ---
>
> Key: SPARK-48461
> URL: https://issues.apache.org/jira/browse/SPARK-48461
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> [Code location 
> here|https://github.com/apache/spark/blob/f5d9b809881552c0e1b5af72b2a32caa25018eb3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L1929]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48446) Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48446:


Assignee: Yuchen Liu

> Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax
> --
>
> Key: SPARK-48446
> URL: https://issues.apache.org/jira/browse/SPARK-48446
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yuchen Liu
>Assignee: Yuchen Liu
>Priority: Minor
>  Labels: easyfix, pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> For dropDuplicates, the example on 
> [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#:~:text=)%20%5C%0A%20%20.-,dropDuplicates,-(%22guid%22]
>  is out of date compared with 
> [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.dropDuplicates.html].
>  The argument should be a list.
> The discrepancy is also true for dropDuplicatesWithinWatermark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48446) Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48446.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46797
[https://github.com/apache/spark/pull/46797]

> Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax
> --
>
> Key: SPARK-48446
> URL: https://issues.apache.org/jira/browse/SPARK-48446
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yuchen Liu
>Assignee: Yuchen Liu
>Priority: Minor
>  Labels: easyfix, pull-request-available
> Fix For: 4.0.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> For dropDuplicates, the example on 
> [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#:~:text=)%20%5C%0A%20%20.-,dropDuplicates,-(%22guid%22]
>  is out of date compared with 
> [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.dropDuplicates.html].
>  The argument should be a list.
> The discrepancy is also true for dropDuplicatesWithinWatermark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48475) Optimize _get_jvm_function in PySpark.

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48475.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46809
[https://github.com/apache/spark/pull/46809]

> Optimize _get_jvm_function in PySpark.
> --
>
> Key: SPARK-48475
> URL: https://issues.apache.org/jira/browse/SPARK-48475
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48464) Refactor SQLConfSuite and StatisticsSuite

2024-05-29 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48464.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46796
[https://github.com/apache/spark/pull/46796]

> Refactor SQLConfSuite and StatisticsSuite
> -
>
> Key: SPARK-48464
> URL: https://issues.apache.org/jira/browse/SPARK-48464
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48454) Directly use the parent dataframe class

2024-05-29 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48454.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46785
[https://github.com/apache/spark/pull/46785]

> Directly use the parent dataframe class
> ---
>
> Key: SPARK-48454
> URL: https://issues.apache.org/jira/browse/SPARK-48454
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48454) Directly use the parent dataframe class

2024-05-29 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48454:


Assignee: Ruifeng Zheng

> Directly use the parent dataframe class
> ---
>
> Key: SPARK-48454
> URL: https://issues.apache.org/jira/browse/SPARK-48454
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48442) Add parenthesis to awaitTermination call

2024-05-29 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48442.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46779
[https://github.com/apache/spark/pull/46779]

> Add parenthesis to awaitTermination call
> 
>
> Key: SPARK-48442
> URL: https://issues.apache.org/jira/browse/SPARK-48442
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.3
>Reporter: Riya Verma
>Assignee: Riya Verma
>Priority: Trivial
>  Labels: correctness, pull-request-available, starter
> Fix For: 4.0.0
>
>
> In {{test_stream_reader}} and {{test_stream_writer}} of 
> {*}test_python_streaming_datasource.py{*}, the call {{q.awaitTermination}} 
> does not invoke a function call as intended, but instead returns a python 
> function object. The fix is to change this to {{{}q.awaitTermination(){}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48459) Implement DataFrameQueryContext in Spark Connect

2024-05-29 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48459:


 Summary: Implement DataFrameQueryContext in Spark Connect
 Key: SPARK-48459
 URL: https://issues.apache.org/jira/browse/SPARK-48459
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Implements the same https://github.com/apache/spark/pull/45377 in Spark Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48445) Don't inline UDFs with non-cheap children in CollapseProject

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48445.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46780
[https://github.com/apache/spark/pull/46780]

> Don't inline UDFs with non-cheap children in CollapseProject
> 
>
> Key: SPARK-48445
> URL: https://issues.apache.org/jira/browse/SPARK-48445
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Because UDFs (and certain other expressions) are considered cheap by 
> CollapseProject.isCheap, they are inlined and potentially duplicated (which 
> is ok, because rules like ExtractPythonUDFs will de-duplicate them). However, 
> if the UDFs contain other non-cheap expressions, those will also be 
> duplicated and can potentially cause performance regressions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48445) Don't inline UDFs with non-cheap children in CollapseProject

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48445:


Assignee: Kelvin Jiang

> Don't inline UDFs with non-cheap children in CollapseProject
> 
>
> Key: SPARK-48445
> URL: https://issues.apache.org/jira/browse/SPARK-48445
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
>
> Because UDFs (and certain other expressions) are considered cheap by 
> CollapseProject.isCheap, they are inlined and potentially duplicated (which 
> is ok, because rules like ExtractPythonUDFs will de-duplicate them). However, 
> if the UDFs contain other non-cheap expressions, those will also be 
> duplicated and can potentially cause performance regressions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23015) spark-submit fails when submitting several jobs in parallel

2024-05-28 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850230#comment-17850230
 ] 

Hyukjin Kwon commented on SPARK-23015:
--

Fixed in https://github.com/apache/spark/pull/43706

> spark-submit fails when submitting several jobs in parallel
> ---
>
> Key: SPARK-23015
> URL: https://issues.apache.org/jira/browse/SPARK-23015
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1
> Environment: Windows 10 (1709/16299.125)
> Spark 2.3.0
> Java 8, Update 151
>Reporter: Hugh Zabriskie
>Priority: Major
>  Labels: bulk-closed, pull-request-available
> Fix For: 4.0.0
>
>
> Spark Submit's launching library prints the command to execute the launcher 
> (org.apache.spark.launcher.main) to a temporary text file, reads the result 
> back into a variable, and then executes that command.
> {code}
> set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
> "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main 
> %* > %LAUNCHER_OUTPUT%
> {code}
> [bin/spark-class2.cmd, 
> L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66]
> That temporary text file is given a pseudo-random name by the %RANDOM% env 
> variable generator, which generates a number between 0 and 32767.
> This appears to be the cause of an error occurring when several spark-submit 
> jobs are launched simultaneously. The following error is returned from stderr:
> {quote}The process cannot access the file because it is being used by another 
> process. The system cannot find the file
> USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt.
> The process cannot access the file because it is being used by another 
> process.{quote}
> My hypothesis is that %RANDOM% is returning the same value for multiple jobs, 
> causing the launcher library to attempt to write to the same file from 
> multiple processes. Another mechanism is needed for reliably generating the 
> names of the temporary files so that the concurrency issue is resolved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-23015) spark-submit fails when submitting several jobs in parallel

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-23015:
--

> spark-submit fails when submitting several jobs in parallel
> ---
>
> Key: SPARK-23015
> URL: https://issues.apache.org/jira/browse/SPARK-23015
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1
> Environment: Windows 10 (1709/16299.125)
> Spark 2.3.0
> Java 8, Update 151
>Reporter: Hugh Zabriskie
>Priority: Major
>  Labels: bulk-closed, pull-request-available
>
> Spark Submit's launching library prints the command to execute the launcher 
> (org.apache.spark.launcher.main) to a temporary text file, reads the result 
> back into a variable, and then executes that command.
> {code}
> set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
> "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main 
> %* > %LAUNCHER_OUTPUT%
> {code}
> [bin/spark-class2.cmd, 
> L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66]
> That temporary text file is given a pseudo-random name by the %RANDOM% env 
> variable generator, which generates a number between 0 and 32767.
> This appears to be the cause of an error occurring when several spark-submit 
> jobs are launched simultaneously. The following error is returned from stderr:
> {quote}The process cannot access the file because it is being used by another 
> process. The system cannot find the file
> USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt.
> The process cannot access the file because it is being used by another 
> process.{quote}
> My hypothesis is that %RANDOM% is returning the same value for multiple jobs, 
> causing the launcher library to attempt to write to the same file from 
> multiple processes. Another mechanism is needed for reliably generating the 
> names of the temporary files so that the concurrency issue is resolved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23015) spark-submit fails when submitting several jobs in parallel

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-23015.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

> spark-submit fails when submitting several jobs in parallel
> ---
>
> Key: SPARK-23015
> URL: https://issues.apache.org/jira/browse/SPARK-23015
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1
> Environment: Windows 10 (1709/16299.125)
> Spark 2.3.0
> Java 8, Update 151
>Reporter: Hugh Zabriskie
>Priority: Major
>  Labels: bulk-closed, pull-request-available
> Fix For: 4.0.0
>
>
> Spark Submit's launching library prints the command to execute the launcher 
> (org.apache.spark.launcher.main) to a temporary text file, reads the result 
> back into a variable, and then executes that command.
> {code}
> set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
> "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main 
> %* > %LAUNCHER_OUTPUT%
> {code}
> [bin/spark-class2.cmd, 
> L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66]
> That temporary text file is given a pseudo-random name by the %RANDOM% env 
> variable generator, which generates a number between 0 and 32767.
> This appears to be the cause of an error occurring when several spark-submit 
> jobs are launched simultaneously. The following error is returned from stderr:
> {quote}The process cannot access the file because it is being used by another 
> process. The system cannot find the file
> USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt.
> The process cannot access the file because it is being used by another 
> process.{quote}
> My hypothesis is that %RANDOM% is returning the same value for multiple jobs, 
> causing the launcher library to attempt to write to the same file from 
> multiple processes. Another mechanism is needed for reliably generating the 
> names of the temporary files so that the concurrency issue is resolved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42965) metadata mismatch for StructField when running some tests.

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42965:


Assignee: Ruifeng Zheng

> metadata mismatch for StructField when running some tests.
> --
>
> Key: SPARK-42965
> URL: https://issues.apache.org/jira/browse/SPARK-42965
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 4.0.0
>
>
> For some reason, the metadata of `StructField` is different in a few tests 
> when using Spark Connect. However, the function works properly.
> For example, when running `python/run-tests --testnames 
> 'pyspark.pandas.tests.connect.data_type_ops.test_parity_binary_ops 
> BinaryOpsParityTests.test_add'` it complains `AssertionError: 
> ([InternalField(dtype=int64, struct_field=StructField('bool', LongType(), 
> False))], [StructField('bool', LongType(), False)])` because metadata is 
> different something like `\{'__autoGeneratedAlias': 'true'}` but they have 
> same name, type and nullable, so the function just works well.
> Therefore, we have temporarily added a branch for Spark Connect in the code 
> so that we can create InternalFrame properly to provide more pandas APIs in 
> Spark Connect. If a clear cause is found, we may need to revert it back to 
> its original state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48322) Drop internal metadata in `DataFrame.schema`

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48322.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46636
[https://github.com/apache/spark/pull/46636]

> Drop internal metadata in `DataFrame.schema`
> 
>
> Key: SPARK-48322
> URL: https://issues.apache.org/jira/browse/SPARK-48322
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42965) metadata mismatch for StructField when running some tests.

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42965.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46636
[https://github.com/apache/spark/pull/46636]

> metadata mismatch for StructField when running some tests.
> --
>
> Key: SPARK-42965
> URL: https://issues.apache.org/jira/browse/SPARK-42965
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
> Fix For: 4.0.0
>
>
> For some reason, the metadata of `StructField` is different in a few tests 
> when using Spark Connect. However, the function works properly.
> For example, when running `python/run-tests --testnames 
> 'pyspark.pandas.tests.connect.data_type_ops.test_parity_binary_ops 
> BinaryOpsParityTests.test_add'` it complains `AssertionError: 
> ([InternalField(dtype=int64, struct_field=StructField('bool', LongType(), 
> False))], [StructField('bool', LongType(), False)])` because metadata is 
> different something like `\{'__autoGeneratedAlias': 'true'}` but they have 
> same name, type and nullable, so the function just works well.
> Therefore, we have temporarily added a branch for Spark Connect in the code 
> so that we can create InternalFrame properly to provide more pandas APIs in 
> Spark Connect. If a clear cause is found, we may need to revert it back to 
> its original state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48322) Drop internal metadata in `DataFrame.schema`

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48322:


Assignee: Ruifeng Zheng

> Drop internal metadata in `DataFrame.schema`
> 
>
> Key: SPARK-48322
> URL: https://issues.apache.org/jira/browse/SPARK-48322
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48438) Directly use the parent column class

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48438.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46775
[https://github.com/apache/spark/pull/46775]

> Directly use the parent column class
> 
>
> Key: SPARK-48438
> URL: https://issues.apache.org/jira/browse/SPARK-48438
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48434) Make printSchema use the cached schema

2024-05-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48434.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46764
[https://github.com/apache/spark/pull/46764]

> Make printSchema use the cached schema
> --
>
> Key: SPARK-48434
> URL: https://issues.apache.org/jira/browse/SPARK-48434
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48434) Make printSchema use the cached schema

2024-05-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48434:


Assignee: Ruifeng Zheng

> Make printSchema use the cached schema
> --
>
> Key: SPARK-48434
> URL: https://issues.apache.org/jira/browse/SPARK-48434
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48432) Unnecessary Integer unboxing in UnivocityParser

2024-05-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48432:


Assignee: Vladimir Golubev

> Unnecessary Integer unboxing in UnivocityParser
> ---
>
> Key: SPARK-48432
> URL: https://issues.apache.org/jira/browse/SPARK-48432
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
>
> `tokenIndexArr` is created as an array of `java.lang.Integers`. However, it 
> is used not only for the wrapped java parser, but also during parsing to 
> identify the correct token index.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48432) Unnecessary Integer unboxing in UnivocityParser

2024-05-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48432.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46759
[https://github.com/apache/spark/pull/46759]

> Unnecessary Integer unboxing in UnivocityParser
> ---
>
> Key: SPARK-48432
> URL: https://issues.apache.org/jira/browse/SPARK-48432
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> `tokenIndexArr` is created as an array of `java.lang.Integers`. However, it 
> is used not only for the wrapped java parser, but also during parsing to 
> identify the correct token index.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name

2024-05-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48425.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46751
[https://github.com/apache/spark/pull/46751]

> Replaces pyspark-connect to pyspark_connect for its output name
> ---
>
> Key: SPARK-48425
> URL: https://issues.apache.org/jira/browse/SPARK-48425
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The issue is at setuptools from 69.X.X.
> It replaces dash in package name to underscore 
> (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`)
> https://github.com/pypa/setuptools/issues/4214



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name

2024-05-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48425:


Assignee: Hyukjin Kwon

> Replaces pyspark-connect to pyspark_connect for its output name
> ---
>
> Key: SPARK-48425
> URL: https://issues.apache.org/jira/browse/SPARK-48425
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> The issue is at setuptools from 69.X.X.
> It replaces dash in package name to underscore 
> (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`)
> https://github.com/pypa/setuptools/issues/4214



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name

2024-05-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48425:
-
Description: 
The issue is at setuptools from 69.X.X.

It replaces dash in package name to underscore 
(`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`)

https://github.com/pypa/setuptools/issues/4214

  was:
The issue is in the regression at setuptools from 69.X.X.

It replaces dash in package name to underscore 
(`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`)

https://github.com/pypa/setuptools/issues/4214


> Replaces pyspark-connect to pyspark_connect for its output name
> ---
>
> Key: SPARK-48425
> URL: https://issues.apache.org/jira/browse/SPARK-48425
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> The issue is at setuptools from 69.X.X.
> It replaces dash in package name to underscore 
> (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`)
> https://github.com/pypa/setuptools/issues/4214



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name

2024-05-26 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48425:


 Summary: Replaces pyspark-connect to pyspark_connect for its 
output name
 Key: SPARK-48425
 URL: https://issues.apache.org/jira/browse/SPARK-48425
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


The issue is in the regression at setuptools from 69.X.X.

It replaces dash in package name to underscore 
(`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`)

https://github.com/pypa/setuptools/issues/4214



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48424) Make dev/is-changed.py to return true it it fails

2024-05-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48424.
--
Fix Version/s: 3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46749
[https://github.com/apache/spark/pull/46749]

> Make dev/is-changed.py to return true it it fails
> -
>
> Key: SPARK-48424
> URL: https://issues.apache.org/jira/browse/SPARK-48424
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>
> e.g., 
> https://github.com/apache/spark/actions/runs/9244026522/job/25435224163?pr=46747



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48424) Make dev/is-changed.py to return true it it fails

2024-05-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48424:


Assignee: Hyukjin Kwon

> Make dev/is-changed.py to return true it it fails
> -
>
> Key: SPARK-48424
> URL: https://issues.apache.org/jira/browse/SPARK-48424
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> e.g., 
> https://github.com/apache/spark/actions/runs/9244026522/job/25435224163?pr=46747



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48424) Make dev/is-changed.py to return true it it fails

2024-05-26 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48424:


 Summary: Make dev/is-changed.py to return true it it fails
 Key: SPARK-48424
 URL: https://issues.apache.org/jira/browse/SPARK-48424
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 4.0.0, 3.5.2
Reporter: Hyukjin Kwon


e.g., 
https://github.com/apache/spark/actions/runs/9244026522/job/25435224163?pr=46747



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48370.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46683
[https://github.com/apache/spark/pull/46683]

> Checkpoint and localCheckpoint in Scala Spark Connect client
> 
>
> Key: SPARK-48370
> URL: https://issues.apache.org/jira/browse/SPARK-48370
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark 
> Connect client. We should do it in Scala too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48370:


Assignee: Hyukjin Kwon

> Checkpoint and localCheckpoint in Scala Spark Connect client
> 
>
> Key: SPARK-48370
> URL: https://issues.apache.org/jira/browse/SPARK-48370
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark 
> Connect client. We should do it in Scala too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48393) Move a group of constants to `pyspark.util`

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48393:


Assignee: Ruifeng Zheng

> Move a group of constants to `pyspark.util`
> ---
>
> Key: SPARK-48393
> URL: https://issues.apache.org/jira/browse/SPARK-48393
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48393) Move a group of constants to `pyspark.util`

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48393.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46710
[https://github.com/apache/spark/pull/46710]

> Move a group of constants to `pyspark.util`
> ---
>
> Key: SPARK-48393
> URL: https://issues.apache.org/jira/browse/SPARK-48393
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-48379) Cancel build during a PR when a new commit is pushed

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-48379:
--
  Assignee: (was: Stefan Kandic)

Reverted in 
https://github.com/apache/spark/commit/9fd85d9acc5acf455d0ad910ef2848695576242b

> Cancel build during a PR when a new commit is pushed
> 
>
> Key: SPARK-48379
> URL: https://issues.apache.org/jira/browse/SPARK-48379
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Creating a new commit on a branch should cancel the build of previous commits 
> for the same branch.
> Exceptions are master and branch-* branches where we still want to have 
> concurrent builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48379) Cancel build during a PR when a new commit is pushed

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48379:
-
Fix Version/s: (was: 4.0.0)

> Cancel build during a PR when a new commit is pushed
> 
>
> Key: SPARK-48379
> URL: https://issues.apache.org/jira/browse/SPARK-48379
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
>
> Creating a new commit on a branch should cancel the build of previous commits 
> for the same branch.
> Exceptions are master and branch-* branches where we still want to have 
> concurrent builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48389.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46703
[https://github.com/apache/spark/pull/46703]

> Remove obsolete workflow cancel_duplicate_workflow_runs
> ---
>
> Key: SPARK-48389
> URL: https://issues.apache.org/jira/browse/SPARK-48389
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> After https://github.com/apache/spark/pull/46689, we don't need this anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48389:


Assignee: Hyukjin Kwon

> Remove obsolete workflow cancel_duplicate_workflow_runs
> ---
>
> Key: SPARK-48389
> URL: https://issues.apache.org/jira/browse/SPARK-48389
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> After https://github.com/apache/spark/pull/46689, we don't need this anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs

2024-05-22 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48389:


 Summary: Remove obsolete workflow cancel_duplicate_workflow_runs
 Key: SPARK-48389
 URL: https://issues.apache.org/jira/browse/SPARK-48389
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


After https://github.com/apache/spark/pull/46689, we don't need this anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48379) Cancel build during a PR when a new commit is pushed

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48379:


Assignee: Stefan Kandic

> Cancel build during a PR when a new commit is pushed
> 
>
> Key: SPARK-48379
> URL: https://issues.apache.org/jira/browse/SPARK-48379
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Creating a new commit on a branch should cancel the build of previous commits 
> for the same branch.
> Exceptions are master and branch-* branches where we still want to have 
> concurrent builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48379) Cancel build during a PR when a new commit is pushed

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48379.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46689
[https://github.com/apache/spark/pull/46689]

> Cancel build during a PR when a new commit is pushed
> 
>
> Key: SPARK-48379
> URL: https://issues.apache.org/jira/browse/SPARK-48379
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Creating a new commit on a branch should cancel the build of previous commits 
> for the same branch.
> Exceptions are master and branch-* branches where we still want to have 
> concurrent builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48341) Allow Spark Connect plugins to use QueryTest in their tests

2024-05-21 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48341.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46667
[https://github.com/apache/spark/pull/46667]

> Allow Spark Connect plugins to use QueryTest in their tests
> ---
>
> Key: SPARK-48341
> URL: https://issues.apache.org/jira/browse/SPARK-48341
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Tom van Bussel
>Assignee: Tom van Bussel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client

2024-05-21 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48370:


 Summary: Checkpoint and localCheckpoint in Scala Spark Connect 
client
 Key: SPARK-48370
 URL: https://issues.apache.org/jira/browse/SPARK-48370
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark Connect 
client. We should do it in Scala too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client

2024-05-21 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48370:
-
Issue Type: Improvement  (was: Bug)

> Checkpoint and localCheckpoint in Scala Spark Connect client
> 
>
> Key: SPARK-48370
> URL: https://issues.apache.org/jira/browse/SPARK-48370
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark 
> Connect client. We should do it in Scala too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48367) Fix lint-scala for scalafmt to detect properly

2024-05-21 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48367.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46679
[https://github.com/apache/spark/pull/46679]

> Fix lint-scala for scalafmt to detect properly
> --
>
> Key: SPARK-48367
> URL: https://issues.apache.org/jira/browse/SPARK-48367
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> ./build/mvn \
> -Pscala-2.13 \
> scalafmt:format \
> -Dscalafmt.skip=false \
> -Dscalafmt.validateOnly=true \
> -Dscalafmt.changedOnly=false \
> -pl connector/connect/common \
> -pl connector/connect/server \
> -pl connector/connect/client/jvm
> {code}
> fails as below:
> {code}
> [INFO] Scalafmt results: 1 of 36 were unformatted
> [INFO] Details:
> [INFO] - Requires formatting: ConnectProtoUtils.scala
> [INFO] - Formatted: UdfUtils.scala
> [INFO] - Formatted: DataTypeProtoConverter.scala
> [INFO] - Formatted: ConnectCommon.scala
> [INFO] - Formatted: ProtoUtils.scala
> [INFO] - Formatted: Abbreviator.scala
> [INFO] - Formatted: ProtoDataTypes.scala
> [INFO] - Formatted: LiteralValueProtoConverter.scala
> [INFO] - Formatted: InvalidPlanInput.scala
> [INFO] - Formatted: ForeachWriterPacket.scala
> [INFO] - Formatted: StreamingListenerPacket.scala
> [INFO] - Formatted: StorageLevelProtoConverter.scala
> [INFO] - Formatted: UdfPacket.scala
> [INFO] - Formatted: ClassFinder.scala
> [INFO] - Formatted: SparkConnectClient.scala
> [INFO] - Formatted: GrpcRetryHandler.scala
> [INFO] - Formatted: GrpcExceptionConverter.scala
> [INFO] - Formatted: ArrowEncoderUtils.scala
> [INFO] - Formatted: ScalaCollectionUtils.scala
> [INFO] - Formatted: ArrowDeserializer.scala
> [INFO] - Formatted: ArrowVectorReader.scala
> [INFO] - Formatted: ArrowSerializer.scala
> [INFO] - Formatted: ConcatenatingArrowStreamReader.scala
> [INFO] - Formatted: RetryPolicy.scala
> [INFO] - Formatted: SparkConnectStubState.scala
> [INFO] - Formatted: ArtifactManager.scala
> [INFO] - Formatted: SparkResult.scala
> [INFO] - Formatted: RetriesExceeded.scala
> [INFO] - Formatted: CloseableIterator.scala
> [INFO] - Formatted: package.scala
> [INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala
> [INFO] - Formatted: ResponseValidator.scala
> [INFO] - Formatted: SparkConnectClientParser.scala
> [INFO] - Formatted: CustomSparkConnectStub.scala
> [INFO] - Formatted: CustomSparkConnectBlockingStub.scala
> [INFO] - Formatted: TestUDFs.scala
> {code}
> This is because the output format has changed due to scalafmt version upgrade.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48367) Fix lint-scala for scalafmt to detect properly

2024-05-21 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48367:


Assignee: Hyukjin Kwon

> Fix lint-scala for scalafmt to detect properly
> --
>
> Key: SPARK-48367
> URL: https://issues.apache.org/jira/browse/SPARK-48367
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> ./build/mvn \
> -Pscala-2.13 \
> scalafmt:format \
> -Dscalafmt.skip=false \
> -Dscalafmt.validateOnly=true \
> -Dscalafmt.changedOnly=false \
> -pl connector/connect/common \
> -pl connector/connect/server \
> -pl connector/connect/client/jvm
> {code}
> fails as below:
> {code}
> [INFO] Scalafmt results: 1 of 36 were unformatted
> [INFO] Details:
> [INFO] - Requires formatting: ConnectProtoUtils.scala
> [INFO] - Formatted: UdfUtils.scala
> [INFO] - Formatted: DataTypeProtoConverter.scala
> [INFO] - Formatted: ConnectCommon.scala
> [INFO] - Formatted: ProtoUtils.scala
> [INFO] - Formatted: Abbreviator.scala
> [INFO] - Formatted: ProtoDataTypes.scala
> [INFO] - Formatted: LiteralValueProtoConverter.scala
> [INFO] - Formatted: InvalidPlanInput.scala
> [INFO] - Formatted: ForeachWriterPacket.scala
> [INFO] - Formatted: StreamingListenerPacket.scala
> [INFO] - Formatted: StorageLevelProtoConverter.scala
> [INFO] - Formatted: UdfPacket.scala
> [INFO] - Formatted: ClassFinder.scala
> [INFO] - Formatted: SparkConnectClient.scala
> [INFO] - Formatted: GrpcRetryHandler.scala
> [INFO] - Formatted: GrpcExceptionConverter.scala
> [INFO] - Formatted: ArrowEncoderUtils.scala
> [INFO] - Formatted: ScalaCollectionUtils.scala
> [INFO] - Formatted: ArrowDeserializer.scala
> [INFO] - Formatted: ArrowVectorReader.scala
> [INFO] - Formatted: ArrowSerializer.scala
> [INFO] - Formatted: ConcatenatingArrowStreamReader.scala
> [INFO] - Formatted: RetryPolicy.scala
> [INFO] - Formatted: SparkConnectStubState.scala
> [INFO] - Formatted: ArtifactManager.scala
> [INFO] - Formatted: SparkResult.scala
> [INFO] - Formatted: RetriesExceeded.scala
> [INFO] - Formatted: CloseableIterator.scala
> [INFO] - Formatted: package.scala
> [INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala
> [INFO] - Formatted: ResponseValidator.scala
> [INFO] - Formatted: SparkConnectClientParser.scala
> [INFO] - Formatted: CustomSparkConnectStub.scala
> [INFO] - Formatted: CustomSparkConnectBlockingStub.scala
> [INFO] - Formatted: TestUDFs.scala
> {code}
> This is because the output format has changed due to scalafmt version upgrade.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48367) Fix lint-scala for scalafmt to detect properly

2024-05-21 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48367:


 Summary: Fix lint-scala for scalafmt to detect properly
 Key: SPARK-48367
 URL: https://issues.apache.org/jira/browse/SPARK-48367
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
./build/mvn \
-Pscala-2.13 \
scalafmt:format \
-Dscalafmt.skip=false \
-Dscalafmt.validateOnly=true \
-Dscalafmt.changedOnly=false \
-pl connector/connect/common \
-pl connector/connect/server \
-pl connector/connect/client/jvm
{code}

fails as below:

{code}
[INFO] Scalafmt results: 1 of 36 were unformatted
[INFO] Details:
[INFO] - Requires formatting: ConnectProtoUtils.scala
[INFO] - Formatted: UdfUtils.scala
[INFO] - Formatted: DataTypeProtoConverter.scala
[INFO] - Formatted: ConnectCommon.scala
[INFO] - Formatted: ProtoUtils.scala
[INFO] - Formatted: Abbreviator.scala
[INFO] - Formatted: ProtoDataTypes.scala
[INFO] - Formatted: LiteralValueProtoConverter.scala
[INFO] - Formatted: InvalidPlanInput.scala
[INFO] - Formatted: ForeachWriterPacket.scala
[INFO] - Formatted: StreamingListenerPacket.scala
[INFO] - Formatted: StorageLevelProtoConverter.scala
[INFO] - Formatted: UdfPacket.scala
[INFO] - Formatted: ClassFinder.scala
[INFO] - Formatted: SparkConnectClient.scala
[INFO] - Formatted: GrpcRetryHandler.scala
[INFO] - Formatted: GrpcExceptionConverter.scala
[INFO] - Formatted: ArrowEncoderUtils.scala
[INFO] - Formatted: ScalaCollectionUtils.scala
[INFO] - Formatted: ArrowDeserializer.scala
[INFO] - Formatted: ArrowVectorReader.scala
[INFO] - Formatted: ArrowSerializer.scala
[INFO] - Formatted: ConcatenatingArrowStreamReader.scala
[INFO] - Formatted: RetryPolicy.scala
[INFO] - Formatted: SparkConnectStubState.scala
[INFO] - Formatted: ArtifactManager.scala
[INFO] - Formatted: SparkResult.scala
[INFO] - Formatted: RetriesExceeded.scala
[INFO] - Formatted: CloseableIterator.scala
[INFO] - Formatted: package.scala
[INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala
[INFO] - Formatted: ResponseValidator.scala
[INFO] - Formatted: SparkConnectClientParser.scala
[INFO] - Formatted: CustomSparkConnectStub.scala
[INFO] - Formatted: CustomSparkConnectBlockingStub.scala
[INFO] - Formatted: TestUDFs.scala
{code}

This is because the output format has changed due to scalafmt version upgrade.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48363) Cleanup some redundant codes in `from_xml`

2024-05-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48363.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46674
[https://github.com/apache/spark/pull/46674]

> Cleanup some redundant codes in `from_xml`
> --
>
> Key: SPARK-48363
> URL: https://issues.apache.org/jira/browse/SPARK-48363
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48363) Cleanup some redundant codes in `from_xml`

2024-05-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48363:


Assignee: BingKun Pan

> Cleanup some redundant codes in `from_xml`
> --
>
> Key: SPARK-48363
> URL: https://issues.apache.org/jira/browse/SPARK-48363
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz

2024-05-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48340.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 4
[https://github.com/apache/spark/pull/4]

> Support TimestampNTZ  infer schema miss prefer_timestamp_ntz
> 
>
> Key: SPARK-48340
> URL: https://issues.apache.org/jira/browse/SPARK-48340
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2024-05-20-18-38-39-769.png
>
>
> !image-2024-05-20-18-38-39-769.png|width=746,height=450!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz

2024-05-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48340:


Assignee: angerszhu

> Support TimestampNTZ  infer schema miss prefer_timestamp_ntz
> 
>
> Key: SPARK-48340
> URL: https://issues.apache.org/jira/browse/SPARK-48340
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-05-20-18-38-39-769.png
>
>
> !image-2024-05-20-18-38-39-769.png|width=746,height=450!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48258) Implement DataFrame.checkpoint and DataFrame.localCheckpoint

2024-05-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48258.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46570
[https://github.com/apache/spark/pull/46570]

> Implement DataFrame.checkpoint and DataFrame.localCheckpoint
> 
>
> Key: SPARK-48258
> URL: https://issues.apache.org/jira/browse/SPARK-48258
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We should add DataFrame.checkpoint and DataFrame.localCheckpoint for feature 
> parity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48333) Test `test_sorting_functions_with_column` with same `Column`

2024-05-19 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48333.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46654
[https://github.com/apache/spark/pull/46654]

> Test `test_sorting_functions_with_column` with same `Column`
> 
>
> Key: SPARK-48333
> URL: https://issues.apache.org/jira/browse/SPARK-48333
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48333) Test `test_sorting_functions_with_column` with same `Column`

2024-05-19 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48333:


Assignee: Ruifeng Zheng

> Test `test_sorting_functions_with_column` with same `Column`
> 
>
> Key: SPARK-48333
> URL: https://issues.apache.org/jira/browse/SPARK-48333
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48319) Test `assert_true` and `raise_error` with the same error class as Spark Classic

2024-05-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48319:


Assignee: Ruifeng Zheng

> Test `assert_true` and `raise_error` with the same error class as Spark 
> Classic
> ---
>
> Key: SPARK-48319
> URL: https://issues.apache.org/jira/browse/SPARK-48319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48319) Test `assert_true` and `raise_error` with the same error class as Spark Classic

2024-05-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48319.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46633
[https://github.com/apache/spark/pull/46633]

> Test `assert_true` and `raise_error` with the same error class as Spark 
> Classic
> ---
>
> Key: SPARK-48319
> URL: https://issues.apache.org/jira/browse/SPARK-48319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48317) Enable test_udtf_with_analyze_using_archive and test_udtf_with_analyze_using_file

2024-05-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48317:


Assignee: Hyukjin Kwon

> Enable test_udtf_with_analyze_using_archive and 
> test_udtf_with_analyze_using_file
> -
>
> Key: SPARK-48317
> URL: https://issues.apache.org/jira/browse/SPARK-48317
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48317) Enable test_udtf_with_analyze_using_archive and test_udtf_with_analyze_using_file

2024-05-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48317.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46632
[https://github.com/apache/spark/pull/46632]

> Enable test_udtf_with_analyze_using_archive and 
> test_udtf_with_analyze_using_file
> -
>
> Key: SPARK-48317
> URL: https://issues.apache.org/jira/browse/SPARK-48317
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48316) Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition

2024-05-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48316:


Assignee: Hyukjin Kwon

> Fix comments for SparkFrameMethodsParityTests.test_coalesce and 
> test_repartition
> 
>
> Key: SPARK-48316
> URL: https://issues.apache.org/jira/browse/SPARK-48316
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48316) Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition

2024-05-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48316.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46629
[https://github.com/apache/spark/pull/46629]

> Fix comments for SparkFrameMethodsParityTests.test_coalesce and 
> test_repartition
> 
>
> Key: SPARK-48316
> URL: https://issues.apache.org/jira/browse/SPARK-48316
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48316) Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition

2024-05-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48316:
-
Summary: Fix comments for SparkFrameMethodsParityTests.test_coalesce and 
test_repartition  (was: Enable SparkFrameMethodsParityTests.test_coalesce and 
test_repartition)

> Fix comments for SparkFrameMethodsParityTests.test_coalesce and 
> test_repartition
> 
>
> Key: SPARK-48316
> URL: https://issues.apache.org/jira/browse/SPARK-48316
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48317) Enable test_udtf_with_analyze_using_archive and test_udtf_with_analyze_using_file

2024-05-16 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48317:


 Summary: Enable test_udtf_with_analyze_using_archive and 
test_udtf_with_analyze_using_file
 Key: SPARK-48317
 URL: https://issues.apache.org/jira/browse/SPARK-48317
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48238) Spark fail to start due to class o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter

2024-05-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48238:
-
Parent: (was: SPARK-47970)
Issue Type: Bug  (was: Sub-task)

> Spark fail to start due to class 
> o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter
> ---
>
> Key: SPARK-48238
> URL: https://issues.apache.org/jira/browse/SPARK-48238
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Blocker
>  Labels: pull-request-available
>
> I tested the latest master branch, it failed to start on YARN mode
> {code:java}
> dev/make-distribution.sh --tgz -Phive,hive-thriftserver,yarn{code}
>  
> {code:java}
> $ bin/spark-sql --master yarn
> WARNING: Using incubator modules: jdk.incubator.vector
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 2024-05-10 17:58:17 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2024-05-10 17:58:18 WARN Client: Neither spark.yarn.jars nor 
> spark.yarn.archive} is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2024-05-10 17:58:25 ERROR SparkContext: Error initializing SparkContext.
> org.sparkproject.jetty.util.MultiException: Multiple exceptions
>     at 
> org.sparkproject.jetty.util.MultiException.ifExceptionThrow(MultiException.java:117)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:751)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) 
> ~[scala-library-2.13.13.jar:?]
>     at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) 
> ~[scala-library-2.13.13.jar:?]
>     at scala.collection.AbstractIterable.foreach(Iterable.scala:935) 
> ~[scala-library-2.13.13.jar:?]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
>     at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
>     at org.apache.spark.SparkContext.(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118)
>  ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.getOrElse(Option.scala:201) [scala-library-2.13.13.jar:?]
>     at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1112)
>  [spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:64)
>

[jira] [Created] (SPARK-48316) Enable SparkFrameMethodsParityTests.test_coalesce and test_repartition

2024-05-16 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48316:


 Summary: Enable SparkFrameMethodsParityTests.test_coalesce and 
test_repartition
 Key: SPARK-48316
 URL: https://issues.apache.org/jira/browse/SPARK-48316
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48310) Cached Properties Should return copies instead of values

2024-05-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48310.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46621
[https://github.com/apache/spark/pull/46621]

> Cached Properties Should return copies instead of values
> 
>
> Key: SPARK-48310
> URL: https://issues.apache.org/jira/browse/SPARK-48310
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When returning cached properties for schema and columns a user might 
> incidentally modify the cached values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48268) Add a configuration for SparkContext.setCheckpointDir

2024-05-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48268.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46571
[https://github.com/apache/spark/pull/46571]

> Add a configuration for SparkContext.setCheckpointDir
> -
>
> Key: SPARK-48268
> URL: https://issues.apache.org/jira/browse/SPARK-48268
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Would be great to have it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48268) Add a configuration for SparkContext.setCheckpointDir

2024-05-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48268:


Assignee: Hyukjin Kwon

> Add a configuration for SparkContext.setCheckpointDir
> -
>
> Key: SPARK-48268
> URL: https://issues.apache.org/jira/browse/SPARK-48268
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Would be great to have it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48295) Turn on compute.ops_on_diff_frames by default

2024-05-15 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48295.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46602
[https://github.com/apache/spark/pull/46602]

> Turn on compute.ops_on_diff_frames by default
> -
>
> Key: SPARK-48295
> URL: https://issues.apache.org/jira/browse/SPARK-48295
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48295) Turn on compute.ops_on_diff_frames by default

2024-05-15 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48295:


Assignee: Ruifeng Zheng

> Turn on compute.ops_on_diff_frames by default
> -
>
> Key: SPARK-48295
> URL: https://issues.apache.org/jira/browse/SPARK-48295
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48100) [SQL][XML] Fix issues in skipping nested structure fields not selected in schema

2024-05-14 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48100:


Assignee: Shujing Yang

> [SQL][XML] Fix issues in skipping nested structure fields not selected in 
> schema
> 
>
> Key: SPARK-48100
> URL: https://issues.apache.org/jira/browse/SPARK-48100
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Shujing Yang
>Assignee: Shujing Yang
>Priority: Major
>  Labels: pull-request-available
>
> Previously, the XML parser can't skip nested structure data fields 
> effectively when they were not selected in the schema. For instance, in the 
> below example, `df.select("struct2").collect()` returns `Seq(null)` as 
> `struct1` wasn't effectively skipped. This PR fixes this issue.
> {code:java}
> 
>   
>     1
>   
>   
>     2
>   
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48100) [SQL][XML] Fix issues in skipping nested structure fields not selected in schema

2024-05-14 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48100.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46348
[https://github.com/apache/spark/pull/46348]

> [SQL][XML] Fix issues in skipping nested structure fields not selected in 
> schema
> 
>
> Key: SPARK-48100
> URL: https://issues.apache.org/jira/browse/SPARK-48100
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Shujing Yang
>Assignee: Shujing Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Previously, the XML parser can't skip nested structure data fields 
> effectively when they were not selected in the schema. For instance, in the 
> below example, `df.select("struct2").collect()` returns `Seq(null)` as 
> `struct1` wasn't effectively skipped. This PR fixes this issue.
> {code:java}
> 
>   
>     1
>   
>   
>     2
>   
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48247) Use all values in a python dict when inferring MapType schema

2024-05-14 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48247.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46547
[https://github.com/apache/spark/pull/46547]

> Use all values in a python dict when inferring MapType schema
> -
>
> Key: SPARK-48247
> URL: https://issues.apache.org/jira/browse/SPARK-48247
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Similar with SPARK-39168



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48247) Use all values in a python dict when inferring MapType schema

2024-05-14 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48247:


Assignee: Hyukjin Kwon

> Use all values in a python dict when inferring MapType schema
> -
>
> Key: SPARK-48247
> URL: https://issues.apache.org/jira/browse/SPARK-48247
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Similar with SPARK-39168



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48266) Move o.a.spark.sql.connect.dsl to test dir

2024-05-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48266.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46567
[https://github.com/apache/spark/pull/46567]

> Move o.a.spark.sql.connect.dsl to test dir
> --
>
> Key: SPARK-48266
> URL: https://issues.apache.org/jira/browse/SPARK-48266
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48268) Add a configuration for SparkContext.setCheckpointDir

2024-05-13 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48268:


 Summary: Add a configuration for SparkContext.setCheckpointDir
 Key: SPARK-48268
 URL: https://issues.apache.org/jira/browse/SPARK-48268
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Would be great to have it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 22135 matches

Mail list logo