[jira] [Updated] (SPARK-47993) Drop Python 3.8 support

2024-04-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47993:
-
Labels: release-notes  (was: release-note)

> Drop Python 3.8 support
> ---
>
> Key: SPARK-47993
> URL: https://issues.apache.org/jira/browse/SPARK-47993
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: release-notes
>
> Python 3.8 is EOL in this October. Considering the release schedule, we 
> should better drop it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47962) Improve doc test in pyspark dataframe

2024-04-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47962.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46189
[https://github.com/apache/spark/pull/46189]

> Improve doc test in pyspark dataframe
> -
>
> Key: SPARK-47962
> URL: https://issues.apache.org/jira/browse/SPARK-47962
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The doc test for dataframe's observe API doesn't use a streaming DF which is 
> wrong. We should start a streaming df to make sure it runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47965) Avoid orNull in TypedConfigBuilder and OptionalConfigEntry

2024-04-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47965.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46197
[https://github.com/apache/spark/pull/46197]

> Avoid orNull in TypedConfigBuilder and OptionalConfigEntry
> --
>
> Key: SPARK-47965
> URL: https://issues.apache.org/jira/browse/SPARK-47965
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Configuration values/keys cannot be nulls. We should fix:
> {code}
> diff --git 
> a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala 
> b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> index 1f19e9444d38..d06535722625 100644
> --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T](
>import ConfigHelpers._
>def this(parent: ConfigBuilder, converter: String => T) = {
> -this(parent, converter, Option(_).map(_.toString).orNull)
> +this(parent, converter, { v: T => v.toString })
>}
>/** Apply a transformation to the user-provided values of the config 
> entry. */
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47965) Avoid orNull in TypedConfigBuilder and OptionalConfigEntry

2024-04-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47965:


Assignee: Hyukjin Kwon

> Avoid orNull in TypedConfigBuilder and OptionalConfigEntry
> --
>
> Key: SPARK-47965
> URL: https://issues.apache.org/jira/browse/SPARK-47965
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> Configuration values/keys cannot be nulls. We should fix:
> {code}
> diff --git 
> a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala 
> b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> index 1f19e9444d38..d06535722625 100644
> --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T](
>import ConfigHelpers._
>def this(parent: ConfigBuilder, converter: String => T) = {
> -this(parent, converter, Option(_).map(_.toString).orNull)
> +this(parent, converter, { v: T => v.toString })
>}
>/** Apply a transformation to the user-provided values of the config 
> entry. */
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47964) Hide SQLContext and HiveContext

2024-04-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47964.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46194
[https://github.com/apache/spark/pull/46194]

> Hide SQLContext and HiveContext
> ---
>
> Key: SPARK-47964
> URL: https://issues.apache.org/jira/browse/SPARK-47964
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47965) Avoid orNull in TypedConfigBuilder

2024-04-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47965:
-
Issue Type: Improvement  (was: Bug)

> Avoid orNull in TypedConfigBuilder
> --
>
> Key: SPARK-47965
> URL: https://issues.apache.org/jira/browse/SPARK-47965
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> Configuration values/keys cannot be nulls. We should fix:
> {code}
> diff --git 
> a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala 
> b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> index 1f19e9444d38..d06535722625 100644
> --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T](
>import ConfigHelpers._
>def this(parent: ConfigBuilder, converter: String => T) = {
> -this(parent, converter, Option(_).map(_.toString).orNull)
> +this(parent, converter, { v: T => v.toString })
>}
>/** Apply a transformation to the user-provided values of the config 
> entry. */
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47965) Avoid orNull in TypedConfigBuilder and OptionalConfigEntry

2024-04-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47965:
-
Summary: Avoid orNull in TypedConfigBuilder and OptionalConfigEntry  (was: 
Avoid orNull in TypedConfigBuilder)

> Avoid orNull in TypedConfigBuilder and OptionalConfigEntry
> --
>
> Key: SPARK-47965
> URL: https://issues.apache.org/jira/browse/SPARK-47965
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> Configuration values/keys cannot be nulls. We should fix:
> {code}
> diff --git 
> a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala 
> b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> index 1f19e9444d38..d06535722625 100644
> --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T](
>import ConfigHelpers._
>def this(parent: ConfigBuilder, converter: String => T) = {
> -this(parent, converter, Option(_).map(_.toString).orNull)
> +this(parent, converter, { v: T => v.toString })
>}
>/** Apply a transformation to the user-provided values of the config 
> entry. */
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47965) Avoid orNull in TypedConfigBuilder

2024-04-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47965:
-
Priority: Minor  (was: Major)

> Avoid orNull in TypedConfigBuilder
> --
>
> Key: SPARK-47965
> URL: https://issues.apache.org/jira/browse/SPARK-47965
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> Configuration values/keys cannot be nulls. We should fix:
> {code}
> diff --git 
> a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala 
> b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> index 1f19e9444d38..d06535722625 100644
> --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T](
>import ConfigHelpers._
>def this(parent: ConfigBuilder, converter: String => T) = {
> -this(parent, converter, Option(_).map(_.toString).orNull)
> +this(parent, converter, { v: T => v.toString })
>}
>/** Apply a transformation to the user-provided values of the config 
> entry. */
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47965) Avoid orNull in TypedConfigBuilder

2024-04-23 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47965:


 Summary: Avoid orNull in TypedConfigBuilder
 Key: SPARK-47965
 URL: https://issues.apache.org/jira/browse/SPARK-47965
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Configuration values/keys cannot be nulls. We should fix:

{code}
diff --git 
a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala 
b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
index 1f19e9444d38..d06535722625 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
@@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T](
   import ConfigHelpers._

   def this(parent: ConfigBuilder, converter: String => T) = {
-this(parent, converter, Option(_).map(_.toString).orNull)
+this(parent, converter, { v: T => v.toString })
   }

   /** Apply a transformation to the user-provided values of the config entry. 
*/
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic

2024-04-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47933:


Assignee: Hyukjin Kwon

> Parent Column class for Spark Connect and Spark Classic
> ---
>
> Key: SPARK-47933
> URL: https://issues.apache.org/jira/browse/SPARK-47933
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic

2024-04-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47933.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46155
[https://github.com/apache/spark/pull/46155]

> Parent Column class for Spark Connect and Spark Classic
> ---
>
> Key: SPARK-47933
> URL: https://issues.apache.org/jira/browse/SPARK-47933
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47903) Add remaining scalar types to the Python variant library

2024-04-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47903.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46122
[https://github.com/apache/spark/pull/46122]

> Add remaining scalar types to the Python variant library
> 
>
> Key: SPARK-47903
> URL: https://issues.apache.org/jira/browse/SPARK-47903
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Added support for reading the remaining scalar data types (binary, timestamp, 
> timestamp_ntz, date, float) to the Python Variant library.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47903) Add remaining scalar types to the Python variant library

2024-04-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47903:


Assignee: Harsh Motwani

> Add remaining scalar types to the Python variant library
> 
>
> Key: SPARK-47903
> URL: https://issues.apache.org/jira/browse/SPARK-47903
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
>
> Added support for reading the remaining scalar data types (binary, timestamp, 
> timestamp_ntz, date, float) to the Python Variant library.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47890) Add python and scala dataframe variant expression aliases.

2024-04-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47890.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46123
[https://github.com/apache/spark/pull/46123]

> Add python and scala dataframe variant expression aliases.
> --
>
> Key: SPARK-47890
> URL: https://issues.apache.org/jira/browse/SPARK-47890
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic

2024-04-21 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47933:


 Summary: Parent Column class for Spark Connect and Spark Classic
 Key: SPARK-47933
 URL: https://issues.apache.org/jira/browse/SPARK-47933
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47909) Parent DataFrame class for Spark Connect and Spark Classic

2024-04-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839436#comment-17839436
 ] 

Hyukjin Kwon commented on SPARK-47909:
--

Yes, I am working on it today :-).

> Parent DataFrame class for Spark Connect and Spark Classic
> --
>
> Key: SPARK-47909
> URL: https://issues.apache.org/jira/browse/SPARK-47909
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47909) Parent DataFrame class for Spark Connect and Spark Classic

2024-04-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47909.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46129
[https://github.com/apache/spark/pull/46129]

> Parent DataFrame class for Spark Connect and Spark Classic
> --
>
> Key: SPARK-47909
> URL: https://issues.apache.org/jira/browse/SPARK-47909
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47909) Parent DataFrame class for Spark Connect and Spark Classic

2024-04-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47909:


Assignee: Hyukjin Kwon

> Parent DataFrame class for Spark Connect and Spark Classic
> --
>
> Key: SPARK-47909
> URL: https://issues.apache.org/jira/browse/SPARK-47909
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47909) Parent DataFrame class for Spark Connect and Spark Classic

2024-04-18 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47909:


 Summary: Parent DataFrame class for Spark Connect and Spark Classic
 Key: SPARK-47909
 URL: https://issues.apache.org/jira/browse/SPARK-47909
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47767) Show offset value in TakeOrderedAndProjectExec

2024-04-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47767:


Assignee: guihuawen

> Show offset value in TakeOrderedAndProjectExec
> --
>
> Key: SPARK-47767
> URL: https://issues.apache.org/jira/browse/SPARK-47767
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0, 4.0.0
>Reporter: guihuawen
>Assignee: guihuawen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Show the offset value in TakeOrderedAndProjectExec.
>  
> For example:
>  
> explain select * from test_limit_offset order by a  limit 2  offset 1;
> plan
> == Physical Plan ==
> TakeOrderedAndProject(limit=3, orderBy=[a#171 ASC NULLS FIRST|#171 ASC NULLS 
> FIRST], output=[a#171|#171])
> +- Scan hive spark_catalog.bigdata_qa.test_limit_offset [a#171|#171], 
> HiveTableRelation [`spark_catalog`.`test`.`test_limit_offset`, 
> org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#171|#171], Partition 
> Cols: []]
>  
> No offset is displayed. If it is displayed, it will be more user-friendly
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47767) Show offset value in TakeOrderedAndProjectExec

2024-04-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47767.
--
Resolution: Fixed

Issue resolved by pull request 45931
[https://github.com/apache/spark/pull/45931]

> Show offset value in TakeOrderedAndProjectExec
> --
>
> Key: SPARK-47767
> URL: https://issues.apache.org/jira/browse/SPARK-47767
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0, 4.0.0
>Reporter: guihuawen
>Assignee: guihuawen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Show the offset value in TakeOrderedAndProjectExec.
>  
> For example:
>  
> explain select * from test_limit_offset order by a  limit 2  offset 1;
> plan
> == Physical Plan ==
> TakeOrderedAndProject(limit=3, orderBy=[a#171 ASC NULLS FIRST|#171 ASC NULLS 
> FIRST], output=[a#171|#171])
> +- Scan hive spark_catalog.bigdata_qa.test_limit_offset [a#171|#171], 
> HiveTableRelation [`spark_catalog`.`test`.`test_limit_offset`, 
> org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#171|#171], Partition 
> Cols: []]
>  
> No offset is displayed. If it is displayed, it will be more user-friendly
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47852) Support DataFrameQueryContext for reverse operations

2024-04-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47852:


Assignee: Haejoon Lee

> Support DataFrameQueryContext for reverse operations
> 
>
> Key: SPARK-47852
> URL: https://issues.apache.org/jira/browse/SPARK-47852
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> To improve error message for reverse ops



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47858) Refactoring the structure for DataFrame error context

2024-04-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47858.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46063
[https://github.com/apache/spark/pull/46063]

> Refactoring the structure for DataFrame error context
> -
>
> Key: SPARK-47858
> URL: https://issues.apache.org/jira/browse/SPARK-47858
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The current implementation for PySpark DataFrame error context could be more 
> flexible by addressing some hacky spots.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47852) Support DataFrameQueryContext for reverse operations

2024-04-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47852.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46063
[https://github.com/apache/spark/pull/46063]

> Support DataFrameQueryContext for reverse operations
> 
>
> Key: SPARK-47852
> URL: https://issues.apache.org/jira/browse/SPARK-47852
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> To improve error message for reverse ops



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47858) Refactoring the structure for DataFrame error context

2024-04-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47858:


Assignee: Haejoon Lee

> Refactoring the structure for DataFrame error context
> -
>
> Key: SPARK-47858
> URL: https://issues.apache.org/jira/browse/SPARK-47858
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> The current implementation for PySpark DataFrame error context could be more 
> flexible by addressing some hacky spots.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47864) Enhance "Installation" page to cover all installable options

2024-04-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47864.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46096
[https://github.com/apache/spark/pull/46096]

> Enhance "Installation" page to cover all installable options
> 
>
> Key: SPARK-47864
> URL: https://issues.apache.org/jira/browse/SPARK-47864
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Like Installation page from Pandas, we might need to cover all installable 
> options with related dependencies from our Installation documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47864) Enhance "Installation" page to cover all installable options

2024-04-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47864:


Assignee: Haejoon Lee

> Enhance "Installation" page to cover all installable options
> 
>
> Key: SPARK-47864
> URL: https://issues.apache.org/jira/browse/SPARK-47864
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Like Installation page from Pandas, we might need to cover all installable 
> options with related dependencies from our Installation documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47891) Improve docstring of mapInPandas

2024-04-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47891.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46108
[https://github.com/apache/spark/pull/46108]

> Improve docstring of mapInPandas
> 
>
> Key: SPARK-47891
> URL: https://issues.apache.org/jira/browse/SPARK-47891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Improve docstring of mapInPandas
>  * "using a Python native function that takes and outputs a pandas DataFrame" 
> is confusing cause the function takes and outputs "ITERATOR of pandas 
> DataFrames" instead.
>  * "All columns are passed together as an iterator of pandas DataFrames" 
> easily mislead users to think the entire DataFrame will be passed together, 
> "a batch of rows" is used instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47830) Reeanble ResourceProfileTests for pyspark-connect

2024-04-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47830.
--
Fix Version/s: 4.0.0
 Assignee: Hyukjin Kwon
   Resolution: Fixed

fixed in https://github.com/apache/spark/pull/46090

> Reeanble ResourceProfileTests for pyspark-connect
> -
>
> Key: SPARK-47830
> URL: https://issues.apache.org/jira/browse/SPARK-47830
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47885) Make pyspark.resource compatible with pyspark-connect

2024-04-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47885.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46100
[https://github.com/apache/spark/pull/46100]

> Make pyspark.resource compatible with pyspark-connect
> -
>
> Key: SPARK-47885
> URL: https://issues.apache.org/jira/browse/SPARK-47885
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47540) SPIP: Pure Python Package (Spark Connect)

2024-04-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47540.
--
Fix Version/s: 4.0.0
 Assignee: Hyukjin Kwon
   Resolution: Done

> SPIP: Pure Python Package (Spark Connect)
> -
>
> Key: SPARK-47540
> URL: https://issues.apache.org/jira/browse/SPARK-47540
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Critical
> Fix For: 4.0.0
>
>
> *Q1. What are you trying to do? Articulate your objectives using absolutely 
> no jargon.*
> As part of the [Spark 
> Connect|https://spark.apache.org/docs/latest/spark-connect-overview.html] 
> development, we have introduced Scala and Python clients. While the Scala 
> client is already provided as a separate library and is available in Maven, 
> the Python client is not. This proposal aims for end users to install the 
> pure Python package for Spark Connect by using pip install pyspark-connect.
> The pure Python package contains only Python source code without jars, which 
> reduces the size of the package significantly and widens the use cases of 
> PySpark. See also [Introducing Spark Connect - The Power of Apache Spark, 
> Everywhere'|https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html].
> *Q2. What problem is this proposal NOT designed to solve?*
> This proposal does not aim to Change existing PySpark package, e.g., pip 
> install pyspark is not affected
> - Implement full compatibility with classic PySpark, e.g., implementing RDD 
> API
> - Address how to launch Spark Connect server. Spark Connect server is 
> launched by users themselves
> - Local mode. Without launching Spark Connect server, users cannot use this 
> package.
> - [Official release channel|https://spark.apache.org/downloads.html] is not 
> affected but only PyPI.
> *Q3. How is it done today, and what are the limits of current practice?*
> Currently, we run pip install pyspark, and it is over 300MB because of 
> dependent jars. In addition, PySpark requires you to set up other 
> environments such as JDK installation.
> This is not suitable when the running environment and resource is limited 
> such as edge devices such as smart home devices.
> Requiring a non-Python environment is not Python friendly.
> *Q4. What is new in your approach and why do you think it will be successful?*
> It provides a pure Python library, which eliminates other environment 
> requirements such as JDK, and reduces the resource usage by decoupling Spark 
> Driver, and reduces the package size.
> *Q5. Who cares? If you are successful, what difference will it make?*
> Users who want to leverage Spark in the limited environment, and want to 
> decouple running JVM with Spark Driver to run Spark as a Service. They can 
> simply pip install pyspark-connect that does not require other dependencies 
> (except Python dependencies just like other Python libraries). 
> *Q6. What are the risks?*
> Because we do not change the existing PySpark package, I do not see any major 
> risk in classic PySpark itself. We will reuse the same Python source, and 
> therefore we should make sure no Py4J is used, and no JVM access is made. 
> This requirement might confuse the developers. At the very least, we should 
> add the dedicated CI to make sure the pure Python package works.
> *Q7. How long will it take?*
> I expect around one month including CI set up. In fact, the prototype is 
> ready so I expect this to be done sooner.
> *Q8. What are the mid-term and final “exams” to check for success?*
> The mid-term goal is to set up a scheduled CI job that builds the pure Python 
> library, and runs all the tests against them.
> The final goral would be to properly test end-to-end usecase from pip 
> installation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47884) Switch ANSI SQL CI job to NON-ANSI SQL CI job

2024-04-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47884.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46099
[https://github.com/apache/spark/pull/46099]

> Switch ANSI SQL CI job to NON-ANSI SQL CI job
> -
>
> Key: SPARK-47884
> URL: https://issues.apache.org/jira/browse/SPARK-47884
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47807) Make pyspark.ml compatible with pyspark-connect

2024-04-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47807:
-
Summary: Make pyspark.ml compatible with pyspark-connect  (was: Make 
pyspark.ml compatible witbh pyspark-connect)

> Make pyspark.ml compatible with pyspark-connect
> ---
>
> Key: SPARK-47807
> URL: https://issues.apache.org/jira/browse/SPARK-47807
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47885) Make pyspark.resource compatible with pyspark-connect

2024-04-17 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47885:


 Summary: Make pyspark.resource compatible with pyspark-connect
 Key: SPARK-47885
 URL: https://issues.apache.org/jira/browse/SPARK-47885
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46375) Add documentation for Python data source API

2024-04-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46375.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46089
[https://github.com/apache/spark/pull/46089]

> Add documentation for Python data source API
> 
>
> Key: SPARK-46375
> URL: https://issues.apache.org/jira/browse/SPARK-46375
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add documentation (user guide) for Python data soruce API.
>  
> Note the documentation should clarify the required dependency: pyarrow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46375) Add documentation for Python data source API

2024-04-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46375:


Assignee: Allison Wang

> Add documentation for Python data source API
> 
>
> Key: SPARK-46375
> URL: https://issues.apache.org/jira/browse/SPARK-46375
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Add documentation (user guide) for Python data soruce API.
>  
> Note the documentation should clarify the required dependency: pyarrow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47877) Speed up test_parity_listener

2024-04-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47877.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46072
[https://github.com/apache/spark/pull/46072]

> Speed up test_parity_listener
> -
>
> Key: SPARK-47877
> URL: https://issues.apache.org/jira/browse/SPARK-47877
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47760) Reeanble Avro function doctests

2024-04-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47760.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46055
[https://github.com/apache/spark/pull/46055]

> Reeanble Avro function doctests
> ---
>
> Key: SPARK-47760
> URL: https://issues.apache.org/jira/browse/SPARK-47760
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47763) Reeanble Protobuf function doctests

2024-04-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47763.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46055
[https://github.com/apache/spark/pull/46055]

> Reeanble Protobuf function doctests
> ---
>
> Key: SPARK-47763
> URL: https://issues.apache.org/jira/browse/SPARK-47763
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47818) Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests

2024-04-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47818:


Assignee: Xi Lyu

> Introduce plan cache in SparkConnectPlanner to improve performance of Analyze 
> requests
> --
>
> Key: SPARK-47818
> URL: https://issues.apache.org/jira/browse/SPARK-47818
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Xi Lyu
>Assignee: Xi Lyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> While building the DataFrame step by step, each time a new DataFrame is 
> generated with an empty schema, which is lazily computed on access. However, 
> if a user's code frequently accesses the schema of these new DataFrames using 
> methods such as `df.columns`, it will result in a large number of Analyze 
> requests to the server. Each time, the entire plan needs to be reanalyzed, 
> leading to poor performance, especially when constructing highly complex 
> plans.
> Now, by introducing plan cache in SparkConnectPlanner, we aim to reduce the 
> overhead of repeated analysis during this process. This is achieved by saving 
> significant computation if the resolved logical plan of a subtree of can be 
> cached.
> A minimal example of the problem:
> {code:java}
> import pyspark.sql.functions as F
> df = spark.range(10)
> for i in range(200):
>   if str(i) not in df.columns: # <-- The df.columns call causes a new Analyze 
> request in every iteration
>     df = df.withColumn(str(i), F.col("id") + i)
> df.show() {code}
> With this patch, the performance of the above code improved from ~110s to ~5s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47818) Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests

2024-04-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47818.
--
Resolution: Fixed

Issue resolved by pull request 46012
[https://github.com/apache/spark/pull/46012]

> Introduce plan cache in SparkConnectPlanner to improve performance of Analyze 
> requests
> --
>
> Key: SPARK-47818
> URL: https://issues.apache.org/jira/browse/SPARK-47818
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Xi Lyu
>Assignee: Xi Lyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> While building the DataFrame step by step, each time a new DataFrame is 
> generated with an empty schema, which is lazily computed on access. However, 
> if a user's code frequently accesses the schema of these new DataFrames using 
> methods such as `df.columns`, it will result in a large number of Analyze 
> requests to the server. Each time, the entire plan needs to be reanalyzed, 
> leading to poor performance, especially when constructing highly complex 
> plans.
> Now, by introducing plan cache in SparkConnectPlanner, we aim to reduce the 
> overhead of repeated analysis during this process. This is achieved by saving 
> significant computation if the resolved logical plan of a subtree of can be 
> cached.
> A minimal example of the problem:
> {code:java}
> import pyspark.sql.functions as F
> df = spark.range(10)
> for i in range(200):
>   if str(i) not in df.columns: # <-- The df.columns call causes a new Analyze 
> request in every iteration
>     df = df.withColumn(str(i), F.col("id") + i)
> df.show() {code}
> With this patch, the performance of the above code improved from ~110s to ~5s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47862) Connect generated proots can't be pickled

2024-04-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47862.
--
Resolution: Fixed

Issue resolved by pull request 46068
[https://github.com/apache/spark/pull/46068]

> Connect generated proots can't be pickled
> -
>
> Key: SPARK-47862
> URL: https://issues.apache.org/jira/browse/SPARK-47862
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When Spark Connect generates the protobuf files, they're manually adjusted 
> and moved to the right folder. However, we did not fix the package for the 
> descriptor. This breaks serializing them to proto.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47862) Connect generated proots can't be pickled

2024-04-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47862:


Assignee: Martin Grund

> Connect generated proots can't be pickled
> -
>
> Key: SPARK-47862
> URL: https://issues.apache.org/jira/browse/SPARK-47862
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When Spark Connect generates the protobuf files, they're manually adjusted 
> and moved to the right folder. However, we did not fix the package for the 
> descriptor. This breaks serializing them to proto.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47371) XML: Ignore row tags in CDATA Tokenizer

2024-04-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47371:


Assignee: Yousof Hosny

> XML: Ignore row tags in CDATA Tokenizer
> ---
>
> Key: SPARK-47371
> URL: https://issues.apache.org/jira/browse/SPARK-47371
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yousof Hosny
>Assignee: Yousof Hosny
>Priority: Minor
>  Labels: pull-request-available
>
> The current parser does not recognize CDATA sections and thus will read row 
> tags that are enclosed within a CDATA section. The expected behavior is for 
> none of the following rows to be read, but they are all read. 
> {code:java}
> // BUG:  rowTag in CDATA section
> val xmlString="""
> 
> 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47371) XML: Ignore row tags in CDATA Tokenizer

2024-04-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47371.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45487
[https://github.com/apache/spark/pull/45487]

> XML: Ignore row tags in CDATA Tokenizer
> ---
>
> Key: SPARK-47371
> URL: https://issues.apache.org/jira/browse/SPARK-47371
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yousof Hosny
>Assignee: Yousof Hosny
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The current parser does not recognize CDATA sections and thus will read row 
> tags that are enclosed within a CDATA section. The expected behavior is for 
> none of the following rows to be read, but they are all read. 
> {code:java}
> // BUG:  rowTag in CDATA section
> val xmlString="""
> 
> 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47866) Deflaky PythonForeachWriterSuite

2024-04-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47866.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46070
[https://github.com/apache/spark/pull/46070]

> Deflaky PythonForeachWriterSuite
> 
>
> Key: SPARK-47866
> URL: https://issues.apache.org/jira/browse/SPARK-47866
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47866) Deflaky PythonForeachWriterSuite

2024-04-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47866:


Assignee: Dongjoon Hyun

> Deflaky PythonForeachWriterSuite
> 
>
> Key: SPARK-47866
> URL: https://issues.apache.org/jira/browse/SPARK-47866
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47851) Document pyspark-connect package

2024-04-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47851.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46054
[https://github.com/apache/spark/pull/46054]

> Document pyspark-connect package
> 
>
> Key: SPARK-47851
> URL: https://issues.apache.org/jira/browse/SPARK-47851
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47851) Document pyspark-connect package

2024-04-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47851:


Assignee: Hyukjin Kwon

> Document pyspark-connect package
> 
>
> Key: SPARK-47851
> URL: https://issues.apache.org/jira/browse/SPARK-47851
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47851) Document pyspark-connect package

2024-04-14 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47851:


 Summary: Document pyspark-connect package
 Key: SPARK-47851
 URL: https://issues.apache.org/jira/browse/SPARK-47851
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47757) Reeanble MemoryProfilerParityTests for pyspark-connect

2024-04-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47757:
-
Summary: Reeanble MemoryProfilerParityTests for pyspark-connect  (was: 
Reeanble ResourceProfileTests for pyspark-connect)

> Reeanble MemoryProfilerParityTests for pyspark-connect
> --
>
> Key: SPARK-47757
> URL: https://issues.apache.org/jira/browse/SPARK-47757
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47830) Reeanble ResourceProfileTests for pyspark-connect

2024-04-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47830:
-
Summary: Reeanble ResourceProfileTests for pyspark-connect  (was: Reeanble 
MemoryProfilerParityTests for pyspark-connect)

> Reeanble ResourceProfileTests for pyspark-connect
> -
>
> Key: SPARK-47830
> URL: https://issues.apache.org/jira/browse/SPARK-47830
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47849) Change release script to release pyspark-connect

2024-04-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47849.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46049
[https://github.com/apache/spark/pull/46049]

> Change release script to release pyspark-connect
> 
>
> Key: SPARK-47849
> URL: https://issues.apache.org/jira/browse/SPARK-47849
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47849) Change release script to release pyspark-connect

2024-04-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47849:


Assignee: Hyukjin Kwon

> Change release script to release pyspark-connect
> 
>
> Key: SPARK-47849
> URL: https://issues.apache.org/jira/browse/SPARK-47849
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47757) Reeanble ResourceProfileTests for pyspark-connect

2024-04-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47757.
--
Fix Version/s: 4.0.0
 Assignee: Hyukjin Kwon
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/46036

> Reeanble ResourceProfileTests for pyspark-connect
> -
>
> Key: SPARK-47757
> URL: https://issues.apache.org/jira/browse/SPARK-47757
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47756) Reeanble UDFProfilerParityTests for pyspark-connect

2024-04-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47756.
--
Fix Version/s: 4.0.0
 Assignee: Hyukjin Kwon
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/46036

> Reeanble UDFProfilerParityTests for pyspark-connect
> ---
>
> Key: SPARK-47756
> URL: https://issues.apache.org/jira/browse/SPARK-47756
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47849) Change release script to release pyspark-connect

2024-04-14 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47849:


 Summary: Change release script to release pyspark-connect
 Key: SPARK-47849
 URL: https://issues.apache.org/jira/browse/SPARK-47849
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47812) Support Serializing Spark Sessions in ForEachBAtch

2024-04-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47812.
--
Resolution: Fixed

Issue resolved by pull request 46002
[https://github.com/apache/spark/pull/46002]

> Support Serializing Spark Sessions in ForEachBAtch
> --
>
> Key: SPARK-47812
> URL: https://issues.apache.org/jira/browse/SPARK-47812
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.1
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SparkSessions using Connect should be serialized when used in ForEachBatch 
> and friends.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47812) Support Serializing Spark Sessions in ForEachBAtch

2024-04-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47812:


Assignee: Martin Grund

> Support Serializing Spark Sessions in ForEachBAtch
> --
>
> Key: SPARK-47812
> URL: https://issues.apache.org/jira/browse/SPARK-47812
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.1
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SparkSessions using Connect should be serialized when used in ForEachBatch 
> and friends.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47831) Run Pandas API on Spark for pyspark-connect package

2024-04-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47831.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46001
[https://github.com/apache/spark/pull/46001]

> Run Pandas API on Spark for pyspark-connect package
> ---
>
> Key: SPARK-47831
> URL: https://issues.apache.org/jira/browse/SPARK-47831
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47831) Run Pandas API on Spark for pyspark-connect package

2024-04-12 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47831:


 Summary: Run Pandas API on Spark for pyspark-connect package
 Key: SPARK-47831
 URL: https://issues.apache.org/jira/browse/SPARK-47831
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47830) Reeanble MemoryProfilerParityTests for pyspark-connect

2024-04-12 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47830:


 Summary: Reeanble MemoryProfilerParityTests for pyspark-connect
 Key: SPARK-47830
 URL: https://issues.apache.org/jira/browse/SPARK-47830
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47827) Missing warnings for deprecated features

2024-04-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47827.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46021
[https://github.com/apache/spark/pull/46021]

> Missing warnings for deprecated features
> 
>
> Key: SPARK-47827
> URL: https://issues.apache.org/jira/browse/SPARK-47827
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> There are some APIs will be removed but missing deprecation warnings



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47827) Missing warnings for deprecated features

2024-04-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47827:


Assignee: Haejoon Lee

> Missing warnings for deprecated features
> 
>
> Key: SPARK-47827
> URL: https://issues.apache.org/jira/browse/SPARK-47827
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> There are some APIs will be removed but missing deprecation warnings



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47174) Client Side Listener - Server side implementation

2024-04-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47174.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45988
[https://github.com/apache/spark/pull/45988]

> Client Side Listener - Server side implementation
> -
>
> Key: SPARK-47174
> URL: https://issues.apache.org/jira/browse/SPARK-47174
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47174) Client Side Listener - Server side implementation

2024-04-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47174:


Assignee: Wei Liu

> Client Side Listener - Server side implementation
> -
>
> Key: SPARK-47174
> URL: https://issues.apache.org/jira/browse/SPARK-47174
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47824) Nondeterminism in pyspark.pandas.series.asof

2024-04-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47824.
--
Fix Version/s: 3.4.3
   3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46018
[https://github.com/apache/spark/pull/46018]

> Nondeterminism in pyspark.pandas.series.asof
> 
>
> Key: SPARK-47824
> URL: https://issues.apache.org/jira/browse/SPARK-47824
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.2, 4.0.0, 3.5.1, 3.3.4
>Reporter: Mark Jarvin
>Assignee: Mark Jarvin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.3, 3.5.2, 4.0.0
>
>
> `max_by` in `pyspark.pandas.series.asof` uses a literal string instead of a 
> generated column as its ordering condition, resulting in nondeterminism.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47824) Nondeterminism in pyspark.pandas.series.asof

2024-04-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47824:


Assignee: Mark Jarvin

> Nondeterminism in pyspark.pandas.series.asof
> 
>
> Key: SPARK-47824
> URL: https://issues.apache.org/jira/browse/SPARK-47824
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.2, 4.0.0, 3.5.1, 3.3.4
>Reporter: Mark Jarvin
>Assignee: Mark Jarvin
>Priority: Major
>  Labels: pull-request-available
>
> `max_by` in `pyspark.pandas.series.asof` uses a literal string instead of a 
> generated column as its ordering condition, resulting in nondeterminism.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47826) Add VariantVal for PySpark

2024-04-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47826:


Assignee: Gene Pang

> Add VariantVal for PySpark
> --
>
> Key: SPARK-47826
> URL: https://issues.apache.org/jira/browse/SPARK-47826
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Gene Pang
>Assignee: Gene Pang
>Priority: Major
> Fix For: 4.0.0
>
>
> Add a `VariantVal` implementation for PySpark. It includes convenience 
> methods to convert the Variant to a string, or to a Python object, so that 
> users can more easily work with Variant data.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47826) Add VariantVal for PySpark

2024-04-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47826.
--
Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/45826

> Add VariantVal for PySpark
> --
>
> Key: SPARK-47826
> URL: https://issues.apache.org/jira/browse/SPARK-47826
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Gene Pang
>Priority: Major
> Fix For: 4.0.0
>
>
> Add a `VariantVal` implementation for PySpark. It includes convenience 
> methods to convert the Variant to a string, or to a Python object, so that 
> users can more easily work with Variant data.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47811) Run ML tests for pyspark-connect package

2024-04-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47811.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45941
[https://github.com/apache/spark/pull/45941]

> Run ML tests for pyspark-connect package
> 
>
> Key: SPARK-47811
> URL: https://issues.apache.org/jira/browse/SPARK-47811
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47811) Run ML tests for pyspark-connect package

2024-04-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47811:


Assignee: Hyukjin Kwon

> Run ML tests for pyspark-connect package
> 
>
> Key: SPARK-47811
> URL: https://issues.apache.org/jira/browse/SPARK-47811
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47811) Run ML tests for pyspark-connect package

2024-04-11 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47811:


 Summary: Run ML tests for pyspark-connect package
 Key: SPARK-47811
 URL: https://issues.apache.org/jira/browse/SPARK-47811
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47807) Make pyspark.ml compatible witbh pyspark-connect

2024-04-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47807.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45995
[https://github.com/apache/spark/pull/45995]

> Make pyspark.ml compatible witbh pyspark-connect
> 
>
> Key: SPARK-47807
> URL: https://issues.apache.org/jira/browse/SPARK-47807
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47704) JSON parsing fails with "java.lang.ClassCastException: org.apache.spark.sql.catalyst.util.ArrayBasedMapData cannot be cast to org.apache.spark.sql.catalyst.util.ArrayDa

2024-04-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47704:


Assignee: Ivan Sadikov

> JSON parsing fails with "java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.util.ArrayBasedMapData cannot be cast to 
> org.apache.spark.sql.catalyst.util.ArrayData" when 
> spark.sql.json.enablePartialResults is enabled
> ---
>
> Key: SPARK-47704
> URL: https://issues.apache.org/jira/browse/SPARK-47704
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>  Labels: pull-request-available
>
> When reading the following JSON \{"a":[{"key":{"b":0}}]}: 
> {code:java}
> val df = spark.read.schema("a array boolean>>>").json(path){code}
> Spark throws exception: 
> {code:java}
> Cause: java.lang.ClassCastException: class 
> org.apache.spark.sql.catalyst.util.ArrayBasedMapData cannot be cast to class 
> org.apache.spark.sql.catalyst.util.ArrayData 
> (org.apache.spark.sql.catalyst.util.ArrayBasedMapData and 
> org.apache.spark.sql.catalyst.util.ArrayData are in unnamed module of loader 
> 'app')
> at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray(rows.scala:53)
> at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray$(rows.scala:53)
> at 
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getArray(rows.scala:172)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:605)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.$anonfun$prepareNextFile$1(FileScanRDD.scala:884)
> at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) {code}
>  
> The same happens for map: \{"a":{"key":[{"b":0}]}} when array and map types 
> are swapped.
> {code:java}
> val df = spark.read.schema("a map boolean>>>").json(path) {code}
>  
> This is a corner case that https://issues.apache.org/jira/browse/SPARK-44940 
> missed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47704) JSON parsing fails with "java.lang.ClassCastException: org.apache.spark.sql.catalyst.util.ArrayBasedMapData cannot be cast to org.apache.spark.sql.catalyst.util.ArrayDa

2024-04-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47704.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45833
[https://github.com/apache/spark/pull/45833]

> JSON parsing fails with "java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.util.ArrayBasedMapData cannot be cast to 
> org.apache.spark.sql.catalyst.util.ArrayData" when 
> spark.sql.json.enablePartialResults is enabled
> ---
>
> Key: SPARK-47704
> URL: https://issues.apache.org/jira/browse/SPARK-47704
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When reading the following JSON \{"a":[{"key":{"b":0}}]}: 
> {code:java}
> val df = spark.read.schema("a array boolean>>>").json(path){code}
> Spark throws exception: 
> {code:java}
> Cause: java.lang.ClassCastException: class 
> org.apache.spark.sql.catalyst.util.ArrayBasedMapData cannot be cast to class 
> org.apache.spark.sql.catalyst.util.ArrayData 
> (org.apache.spark.sql.catalyst.util.ArrayBasedMapData and 
> org.apache.spark.sql.catalyst.util.ArrayData are in unnamed module of loader 
> 'app')
> at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray(rows.scala:53)
> at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray$(rows.scala:53)
> at 
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getArray(rows.scala:172)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:605)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.$anonfun$prepareNextFile$1(FileScanRDD.scala:884)
> at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) {code}
>  
> The same happens for map: \{"a":{"key":[{"b":0}]}} when array and map types 
> are swapped.
> {code:java}
> val df = spark.read.schema("a map boolean>>>").json(path) {code}
>  
> This is a corner case that https://issues.apache.org/jira/browse/SPARK-44940 
> missed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41811) Implement SparkSession.sql's string formatter

2024-04-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41811.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45614
[https://github.com/apache/spark/pull/45614]

> Implement SparkSession.sql's string formatter
> -
>
> Key: SPARK-41811
> URL: https://issues.apache.org/jira/browse/SPARK-41811
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> **
> File "/.../spark/python/pyspark/sql/connect/session.py", line 345, in 
> pyspark.sql.connect.session.SparkSession.sql
> Failed example:
> spark.sql(
> "SELECT * FROM range(10) WHERE id > {bound1} AND id < {bound2}", 
> bound1=7, bound2=9
> ).show()
> Exception raised:
> Traceback (most recent call last):
>   File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
> 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 
> 1, in 
> spark.sql(
> TypeError: sql() got an unexpected keyword argument 'bound1'
> **
> File "/.../spark/python/pyspark/sql/connect/session.py", line 355, in 
> pyspark.sql.connect.session.SparkSession.sql
> Failed example:
> spark.sql(
> "SELECT {col} FROM {mydf} WHERE id IN {x}",
> col=mydf.id, mydf=mydf, x=tuple(range(4))).show()
> Exception raised:
> Traceback (most recent call last):
>   File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
> 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 
> 1, in 
> spark.sql(
> TypeError: sql() got an unexpected keyword argument 'col'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41811) Implement SparkSession.sql's string formatter

2024-04-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41811:


Assignee: Ruifeng Zheng

> Implement SparkSession.sql's string formatter
> -
>
> Key: SPARK-41811
> URL: https://issues.apache.org/jira/browse/SPARK-41811
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> **
> File "/.../spark/python/pyspark/sql/connect/session.py", line 345, in 
> pyspark.sql.connect.session.SparkSession.sql
> Failed example:
> spark.sql(
> "SELECT * FROM range(10) WHERE id > {bound1} AND id < {bound2}", 
> bound1=7, bound2=9
> ).show()
> Exception raised:
> Traceback (most recent call last):
>   File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
> 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 
> 1, in 
> spark.sql(
> TypeError: sql() got an unexpected keyword argument 'bound1'
> **
> File "/.../spark/python/pyspark/sql/connect/session.py", line 355, in 
> pyspark.sql.connect.session.SparkSession.sql
> Failed example:
> spark.sql(
> "SELECT {col} FROM {mydf} WHERE id IN {x}",
> col=mydf.id, mydf=mydf, x=tuple(range(4))).show()
> Exception raised:
> Traceback (most recent call last):
>   File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
> 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 
> 1, in 
> spark.sql(
> TypeError: sql() got an unexpected keyword argument 'col'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47763) Reeanble Protobuf function doctests

2024-04-08 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47763:


 Summary: Reeanble Protobuf function doctests
 Key: SPARK-47763
 URL: https://issues.apache.org/jira/browse/SPARK-47763
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47762) Add pyspark.sql.connect.protobuf into setup.py

2024-04-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47762.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45924
[https://github.com/apache/spark/pull/45924]

> Add pyspark.sql.connect.protobuf into setup.py
> --
>
> Key: SPARK-47762
> URL: https://issues.apache.org/jira/browse/SPARK-47762
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We should add them.They are missing in pypi package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47762) Add pyspark.sql.connect.protobuf into setup.py

2024-04-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47762:
-
Fix Version/s: 3.5.2

> Add pyspark.sql.connect.protobuf into setup.py
> --
>
> Key: SPARK-47762
> URL: https://issues.apache.org/jira/browse/SPARK-47762
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> We should add them.They are missing in pypi package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47762) Add pyspark.sql.connect.protobuf into setup.py

2024-04-08 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47762:


 Summary: Add pyspark.sql.connect.protobuf into setup.py
 Key: SPARK-47762
 URL: https://issues.apache.org/jira/browse/SPARK-47762
 Project: Spark
  Issue Type: Bug
  Components: Connect, PySpark
Affects Versions: 3.5.1, 4.0.0
Reporter: Hyukjin Kwon


We should add them.They are missing in pypi package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47760) Reeanble Avro function doctests

2024-04-08 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47760:


 Summary: Reeanble Avro function doctests
 Key: SPARK-47760
 URL: https://issues.apache.org/jira/browse/SPARK-47760
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47756) Reeanble UDFProfilerParityTests for pyspark-connect

2024-04-07 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47756:


 Summary: Reeanble UDFProfilerParityTests for pyspark-connect
 Key: SPARK-47756
 URL: https://issues.apache.org/jira/browse/SPARK-47756
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47757) Reeanble ResourceProfileTests for pyspark-connect

2024-04-07 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47757:


 Summary: Reeanble ResourceProfileTests for pyspark-connect
 Key: SPARK-47757
 URL: https://issues.apache.org/jira/browse/SPARK-47757
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47755) Pivot should fail when the number of distinct values is too large

2024-04-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47755.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45918
[https://github.com/apache/spark/pull/45918]

> Pivot should fail when the number of distinct values is too large
> -
>
> Key: SPARK-47755
> URL: https://issues.apache.org/jira/browse/SPARK-47755
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47752) Make pyspark.pandas compatible with pyspark-connect

2024-04-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47752.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45915
[https://github.com/apache/spark/pull/45915]

> Make pyspark.pandas compatible with pyspark-connect
> ---
>
> Key: SPARK-47752
> URL: https://issues.apache.org/jira/browse/SPARK-47752
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47753) Make pyspark.testing compatible with pyspark-connect

2024-04-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47753:


Assignee: Hyukjin Kwon

> Make pyspark.testing compatible with pyspark-connect
> 
>
> Key: SPARK-47753
> URL: https://issues.apache.org/jira/browse/SPARK-47753
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47753) Make pyspark.testing compatible with pyspark-connect

2024-04-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47753.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45916
[https://github.com/apache/spark/pull/45916]

> Make pyspark.testing compatible with pyspark-connect
> 
>
> Key: SPARK-47753
> URL: https://issues.apache.org/jira/browse/SPARK-47753
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47753) Make pyspark.testing compatible with pyspark-connect

2024-04-07 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47753:


 Summary: Make pyspark.testing compatible with pyspark-connect
 Key: SPARK-47753
 URL: https://issues.apache.org/jira/browse/SPARK-47753
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47735) Make pyspark.testing.connectutils compatible with pyspark-connect

2024-04-04 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47735:


 Summary: Make pyspark.testing.connectutils compatible with 
pyspark-connect
 Key: SPARK-47735
 URL: https://issues.apache.org/jira/browse/SPARK-47735
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47734) Fix flaky pyspark.sql.dataframe.DataFrame.writeStream doctest by stopping streaming query

2024-04-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47734.
--
Fix Version/s: 4.0.0
   3.5.2
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/45885

> Fix flaky pyspark.sql.dataframe.DataFrame.writeStream doctest by stopping 
> streaming query
> -
>
> Key: SPARK-47734
> URL: https://issues.apache.org/jira/browse/SPARK-47734
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> https://issues.apache.org/jira/browse/SPARK-47199 didn't fix the flakiness in 
> the pyspark.sql.dataframe.DataFrame.writeStream doctest : the problem is not 
> that we are colliding on the test but, rather, that the test is starting a 
> background thread to write to a directory then deleting that directory from 
> the main test thread, something which is inherently race prone.
> The fix is simple: stop the streaming query in the doctest itself, similar to 
> other streaming doctest examples.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47565) PySpark workers dying in daemon mode idle queue fail query

2024-04-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47565.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45635
[https://github.com/apache/spark/pull/45635]

> PySpark workers dying in daemon mode idle queue fail query
> --
>
> Key: SPARK-47565
> URL: https://issues.apache.org/jira/browse/SPARK-47565
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.2, 3.5.1, 3.3.4
>Reporter: Sebastian Hillig
>Assignee: Nikita Awasthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> PySpark workers may die after entering the idle queue in 
> `PythonWorkerFactory`. This may happen because of code that runs in the 
> process, or external factors.
> When drawn from the warmpool, such a worker will result in an I/O exception 
> on the first read/write .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47565) PySpark workers dying in daemon mode idle queue fail query

2024-04-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47565:


Assignee: Nikita Awasthi

> PySpark workers dying in daemon mode idle queue fail query
> --
>
> Key: SPARK-47565
> URL: https://issues.apache.org/jira/browse/SPARK-47565
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.2, 3.5.1, 3.3.4
>Reporter: Sebastian Hillig
>Assignee: Nikita Awasthi
>Priority: Major
>  Labels: pull-request-available
>
> PySpark workers may die after entering the idle queue in 
> `PythonWorkerFactory`. This may happen because of code that runs in the 
> process, or external factors.
> When drawn from the warmpool, such a worker will result in an I/O exception 
> on the first read/write .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47727) Make SparkConf to root level to for both SparkSession and SparkContext

2024-04-04 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47727:


 Summary: Make SparkConf to root level to for both SparkSession and 
SparkContext
 Key: SPARK-47727
 URL: https://issues.apache.org/jira/browse/SPARK-47727
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47725) Set up the CI for pyspark-connect package

2024-04-03 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47725:


 Summary: Set up the CI for pyspark-connect package
 Key: SPARK-47725
 URL: https://issues.apache.org/jira/browse/SPARK-47725
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47724) Add an environment variable for testing remote pure Python library

2024-04-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47724.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45868
[https://github.com/apache/spark/pull/45868]

> Add an environment variable for testing remote pure Python library
> --
>
> Key: SPARK-47724
> URL: https://issues.apache.org/jira/browse/SPARK-47724
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47724) Add an environment variable for testing remote pure Python library

2024-04-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47724:


Assignee: Hyukjin Kwon

> Add an environment variable for testing remote pure Python library
> --
>
> Key: SPARK-47724
> URL: https://issues.apache.org/jira/browse/SPARK-47724
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47724) Add an environment variable for testing remote pure Python library

2024-04-03 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47724:


 Summary: Add an environment variable for testing remote pure 
Python library
 Key: SPARK-47724
 URL: https://issues.apache.org/jira/browse/SPARK-47724
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47683) Separate pure Python packaging

2024-04-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47683.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45053
[https://github.com/apache/spark/pull/45053]

> Separate pure Python packaging
> --
>
> Key: SPARK-47683
> URL: https://issues.apache.org/jira/browse/SPARK-47683
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Initial version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >