[jira] [Updated] (SPARK-47993) Drop Python 3.8 support
[ https://issues.apache.org/jira/browse/SPARK-47993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47993: - Labels: release-notes (was: release-note) > Drop Python 3.8 support > --- > > Key: SPARK-47993 > URL: https://issues.apache.org/jira/browse/SPARK-47993 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: release-notes > > Python 3.8 is EOL in this October. Considering the release schedule, we > should better drop it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47962) Improve doc test in pyspark dataframe
[ https://issues.apache.org/jira/browse/SPARK-47962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47962. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46189 [https://github.com/apache/spark/pull/46189] > Improve doc test in pyspark dataframe > - > > Key: SPARK-47962 > URL: https://issues.apache.org/jira/browse/SPARK-47962 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > The doc test for dataframe's observe API doesn't use a streaming DF which is > wrong. We should start a streaming df to make sure it runs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47965) Avoid orNull in TypedConfigBuilder and OptionalConfigEntry
[ https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47965. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46197 [https://github.com/apache/spark/pull/46197] > Avoid orNull in TypedConfigBuilder and OptionalConfigEntry > -- > > Key: SPARK-47965 > URL: https://issues.apache.org/jira/browse/SPARK-47965 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Configuration values/keys cannot be nulls. We should fix: > {code} > diff --git > a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > index 1f19e9444d38..d06535722625 100644 > --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T]( >import ConfigHelpers._ >def this(parent: ConfigBuilder, converter: String => T) = { > -this(parent, converter, Option(_).map(_.toString).orNull) > +this(parent, converter, { v: T => v.toString }) >} >/** Apply a transformation to the user-provided values of the config > entry. */ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47965) Avoid orNull in TypedConfigBuilder and OptionalConfigEntry
[ https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47965: Assignee: Hyukjin Kwon > Avoid orNull in TypedConfigBuilder and OptionalConfigEntry > -- > > Key: SPARK-47965 > URL: https://issues.apache.org/jira/browse/SPARK-47965 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > Configuration values/keys cannot be nulls. We should fix: > {code} > diff --git > a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > index 1f19e9444d38..d06535722625 100644 > --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T]( >import ConfigHelpers._ >def this(parent: ConfigBuilder, converter: String => T) = { > -this(parent, converter, Option(_).map(_.toString).orNull) > +this(parent, converter, { v: T => v.toString }) >} >/** Apply a transformation to the user-provided values of the config > entry. */ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47964) Hide SQLContext and HiveContext
[ https://issues.apache.org/jira/browse/SPARK-47964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47964. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46194 [https://github.com/apache/spark/pull/46194] > Hide SQLContext and HiveContext > --- > > Key: SPARK-47964 > URL: https://issues.apache.org/jira/browse/SPARK-47964 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47965) Avoid orNull in TypedConfigBuilder
[ https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47965: - Issue Type: Improvement (was: Bug) > Avoid orNull in TypedConfigBuilder > -- > > Key: SPARK-47965 > URL: https://issues.apache.org/jira/browse/SPARK-47965 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > Configuration values/keys cannot be nulls. We should fix: > {code} > diff --git > a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > index 1f19e9444d38..d06535722625 100644 > --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T]( >import ConfigHelpers._ >def this(parent: ConfigBuilder, converter: String => T) = { > -this(parent, converter, Option(_).map(_.toString).orNull) > +this(parent, converter, { v: T => v.toString }) >} >/** Apply a transformation to the user-provided values of the config > entry. */ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47965) Avoid orNull in TypedConfigBuilder and OptionalConfigEntry
[ https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47965: - Summary: Avoid orNull in TypedConfigBuilder and OptionalConfigEntry (was: Avoid orNull in TypedConfigBuilder) > Avoid orNull in TypedConfigBuilder and OptionalConfigEntry > -- > > Key: SPARK-47965 > URL: https://issues.apache.org/jira/browse/SPARK-47965 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > Configuration values/keys cannot be nulls. We should fix: > {code} > diff --git > a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > index 1f19e9444d38..d06535722625 100644 > --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T]( >import ConfigHelpers._ >def this(parent: ConfigBuilder, converter: String => T) = { > -this(parent, converter, Option(_).map(_.toString).orNull) > +this(parent, converter, { v: T => v.toString }) >} >/** Apply a transformation to the user-provided values of the config > entry. */ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47965) Avoid orNull in TypedConfigBuilder
[ https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47965: - Priority: Minor (was: Major) > Avoid orNull in TypedConfigBuilder > -- > > Key: SPARK-47965 > URL: https://issues.apache.org/jira/browse/SPARK-47965 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > Configuration values/keys cannot be nulls. We should fix: > {code} > diff --git > a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > index 1f19e9444d38..d06535722625 100644 > --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala > @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T]( >import ConfigHelpers._ >def this(parent: ConfigBuilder, converter: String => T) = { > -this(parent, converter, Option(_).map(_.toString).orNull) > +this(parent, converter, { v: T => v.toString }) >} >/** Apply a transformation to the user-provided values of the config > entry. */ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47965) Avoid orNull in TypedConfigBuilder
Hyukjin Kwon created SPARK-47965: Summary: Avoid orNull in TypedConfigBuilder Key: SPARK-47965 URL: https://issues.apache.org/jira/browse/SPARK-47965 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Configuration values/keys cannot be nulls. We should fix: {code} diff --git a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala index 1f19e9444d38..d06535722625 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T]( import ConfigHelpers._ def this(parent: ConfigBuilder, converter: String => T) = { -this(parent, converter, Option(_).map(_.toString).orNull) +this(parent, converter, { v: T => v.toString }) } /** Apply a transformation to the user-provided values of the config entry. */ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-47933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47933: Assignee: Hyukjin Kwon > Parent Column class for Spark Connect and Spark Classic > --- > > Key: SPARK-47933 > URL: https://issues.apache.org/jira/browse/SPARK-47933 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-47933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47933. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46155 [https://github.com/apache/spark/pull/46155] > Parent Column class for Spark Connect and Spark Classic > --- > > Key: SPARK-47933 > URL: https://issues.apache.org/jira/browse/SPARK-47933 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47903) Add remaining scalar types to the Python variant library
[ https://issues.apache.org/jira/browse/SPARK-47903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47903. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46122 [https://github.com/apache/spark/pull/46122] > Add remaining scalar types to the Python variant library > > > Key: SPARK-47903 > URL: https://issues.apache.org/jira/browse/SPARK-47903 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Added support for reading the remaining scalar data types (binary, timestamp, > timestamp_ntz, date, float) to the Python Variant library. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47903) Add remaining scalar types to the Python variant library
[ https://issues.apache.org/jira/browse/SPARK-47903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47903: Assignee: Harsh Motwani > Add remaining scalar types to the Python variant library > > > Key: SPARK-47903 > URL: https://issues.apache.org/jira/browse/SPARK-47903 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > > Added support for reading the remaining scalar data types (binary, timestamp, > timestamp_ntz, date, float) to the Python Variant library. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47890) Add python and scala dataframe variant expression aliases.
[ https://issues.apache.org/jira/browse/SPARK-47890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47890. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46123 [https://github.com/apache/spark/pull/46123] > Add python and scala dataframe variant expression aliases. > -- > > Key: SPARK-47890 > URL: https://issues.apache.org/jira/browse/SPARK-47890 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic
Hyukjin Kwon created SPARK-47933: Summary: Parent Column class for Spark Connect and Spark Classic Key: SPARK-47933 URL: https://issues.apache.org/jira/browse/SPARK-47933 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47909) Parent DataFrame class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-47909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839436#comment-17839436 ] Hyukjin Kwon commented on SPARK-47909: -- Yes, I am working on it today :-). > Parent DataFrame class for Spark Connect and Spark Classic > -- > > Key: SPARK-47909 > URL: https://issues.apache.org/jira/browse/SPARK-47909 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47909) Parent DataFrame class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-47909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47909. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46129 [https://github.com/apache/spark/pull/46129] > Parent DataFrame class for Spark Connect and Spark Classic > -- > > Key: SPARK-47909 > URL: https://issues.apache.org/jira/browse/SPARK-47909 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47909) Parent DataFrame class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-47909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47909: Assignee: Hyukjin Kwon > Parent DataFrame class for Spark Connect and Spark Classic > -- > > Key: SPARK-47909 > URL: https://issues.apache.org/jira/browse/SPARK-47909 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47909) Parent DataFrame class for Spark Connect and Spark Classic
Hyukjin Kwon created SPARK-47909: Summary: Parent DataFrame class for Spark Connect and Spark Classic Key: SPARK-47909 URL: https://issues.apache.org/jira/browse/SPARK-47909 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47767) Show offset value in TakeOrderedAndProjectExec
[ https://issues.apache.org/jira/browse/SPARK-47767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47767: Assignee: guihuawen > Show offset value in TakeOrderedAndProjectExec > -- > > Key: SPARK-47767 > URL: https://issues.apache.org/jira/browse/SPARK-47767 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0, 3.5.0, 4.0.0 >Reporter: guihuawen >Assignee: guihuawen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Show the offset value in TakeOrderedAndProjectExec. > > For example: > > explain select * from test_limit_offset order by a limit 2 offset 1; > plan > == Physical Plan == > TakeOrderedAndProject(limit=3, orderBy=[a#171 ASC NULLS FIRST|#171 ASC NULLS > FIRST], output=[a#171|#171]) > +- Scan hive spark_catalog.bigdata_qa.test_limit_offset [a#171|#171], > HiveTableRelation [`spark_catalog`.`test`.`test_limit_offset`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#171|#171], Partition > Cols: []] > > No offset is displayed. If it is displayed, it will be more user-friendly > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47767) Show offset value in TakeOrderedAndProjectExec
[ https://issues.apache.org/jira/browse/SPARK-47767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47767. -- Resolution: Fixed Issue resolved by pull request 45931 [https://github.com/apache/spark/pull/45931] > Show offset value in TakeOrderedAndProjectExec > -- > > Key: SPARK-47767 > URL: https://issues.apache.org/jira/browse/SPARK-47767 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0, 3.5.0, 4.0.0 >Reporter: guihuawen >Assignee: guihuawen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Show the offset value in TakeOrderedAndProjectExec. > > For example: > > explain select * from test_limit_offset order by a limit 2 offset 1; > plan > == Physical Plan == > TakeOrderedAndProject(limit=3, orderBy=[a#171 ASC NULLS FIRST|#171 ASC NULLS > FIRST], output=[a#171|#171]) > +- Scan hive spark_catalog.bigdata_qa.test_limit_offset [a#171|#171], > HiveTableRelation [`spark_catalog`.`test`.`test_limit_offset`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [a#171|#171], Partition > Cols: []] > > No offset is displayed. If it is displayed, it will be more user-friendly > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47852) Support DataFrameQueryContext for reverse operations
[ https://issues.apache.org/jira/browse/SPARK-47852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47852: Assignee: Haejoon Lee > Support DataFrameQueryContext for reverse operations > > > Key: SPARK-47852 > URL: https://issues.apache.org/jira/browse/SPARK-47852 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > To improve error message for reverse ops -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47858) Refactoring the structure for DataFrame error context
[ https://issues.apache.org/jira/browse/SPARK-47858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47858. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46063 [https://github.com/apache/spark/pull/46063] > Refactoring the structure for DataFrame error context > - > > Key: SPARK-47858 > URL: https://issues.apache.org/jira/browse/SPARK-47858 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > The current implementation for PySpark DataFrame error context could be more > flexible by addressing some hacky spots. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47852) Support DataFrameQueryContext for reverse operations
[ https://issues.apache.org/jira/browse/SPARK-47852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47852. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46063 [https://github.com/apache/spark/pull/46063] > Support DataFrameQueryContext for reverse operations > > > Key: SPARK-47852 > URL: https://issues.apache.org/jira/browse/SPARK-47852 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > To improve error message for reverse ops -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47858) Refactoring the structure for DataFrame error context
[ https://issues.apache.org/jira/browse/SPARK-47858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47858: Assignee: Haejoon Lee > Refactoring the structure for DataFrame error context > - > > Key: SPARK-47858 > URL: https://issues.apache.org/jira/browse/SPARK-47858 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > The current implementation for PySpark DataFrame error context could be more > flexible by addressing some hacky spots. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47864) Enhance "Installation" page to cover all installable options
[ https://issues.apache.org/jira/browse/SPARK-47864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47864. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46096 [https://github.com/apache/spark/pull/46096] > Enhance "Installation" page to cover all installable options > > > Key: SPARK-47864 > URL: https://issues.apache.org/jira/browse/SPARK-47864 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Like Installation page from Pandas, we might need to cover all installable > options with related dependencies from our Installation documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47864) Enhance "Installation" page to cover all installable options
[ https://issues.apache.org/jira/browse/SPARK-47864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47864: Assignee: Haejoon Lee > Enhance "Installation" page to cover all installable options > > > Key: SPARK-47864 > URL: https://issues.apache.org/jira/browse/SPARK-47864 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Like Installation page from Pandas, we might need to cover all installable > options with related dependencies from our Installation documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47891) Improve docstring of mapInPandas
[ https://issues.apache.org/jira/browse/SPARK-47891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47891. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46108 [https://github.com/apache/spark/pull/46108] > Improve docstring of mapInPandas > > > Key: SPARK-47891 > URL: https://issues.apache.org/jira/browse/SPARK-47891 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Improve docstring of mapInPandas > * "using a Python native function that takes and outputs a pandas DataFrame" > is confusing cause the function takes and outputs "ITERATOR of pandas > DataFrames" instead. > * "All columns are passed together as an iterator of pandas DataFrames" > easily mislead users to think the entire DataFrame will be passed together, > "a batch of rows" is used instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47830) Reeanble ResourceProfileTests for pyspark-connect
[ https://issues.apache.org/jira/browse/SPARK-47830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47830. -- Fix Version/s: 4.0.0 Assignee: Hyukjin Kwon Resolution: Fixed fixed in https://github.com/apache/spark/pull/46090 > Reeanble ResourceProfileTests for pyspark-connect > - > > Key: SPARK-47830 > URL: https://issues.apache.org/jira/browse/SPARK-47830 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47885) Make pyspark.resource compatible with pyspark-connect
[ https://issues.apache.org/jira/browse/SPARK-47885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47885. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46100 [https://github.com/apache/spark/pull/46100] > Make pyspark.resource compatible with pyspark-connect > - > > Key: SPARK-47885 > URL: https://issues.apache.org/jira/browse/SPARK-47885 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47540) SPIP: Pure Python Package (Spark Connect)
[ https://issues.apache.org/jira/browse/SPARK-47540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47540. -- Fix Version/s: 4.0.0 Assignee: Hyukjin Kwon Resolution: Done > SPIP: Pure Python Package (Spark Connect) > - > > Key: SPARK-47540 > URL: https://issues.apache.org/jira/browse/SPARK-47540 > Project: Spark > Issue Type: Umbrella > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Critical > Fix For: 4.0.0 > > > *Q1. What are you trying to do? Articulate your objectives using absolutely > no jargon.* > As part of the [Spark > Connect|https://spark.apache.org/docs/latest/spark-connect-overview.html] > development, we have introduced Scala and Python clients. While the Scala > client is already provided as a separate library and is available in Maven, > the Python client is not. This proposal aims for end users to install the > pure Python package for Spark Connect by using pip install pyspark-connect. > The pure Python package contains only Python source code without jars, which > reduces the size of the package significantly and widens the use cases of > PySpark. See also [Introducing Spark Connect - The Power of Apache Spark, > Everywhere'|https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html]. > *Q2. What problem is this proposal NOT designed to solve?* > This proposal does not aim to Change existing PySpark package, e.g., pip > install pyspark is not affected > - Implement full compatibility with classic PySpark, e.g., implementing RDD > API > - Address how to launch Spark Connect server. Spark Connect server is > launched by users themselves > - Local mode. Without launching Spark Connect server, users cannot use this > package. > - [Official release channel|https://spark.apache.org/downloads.html] is not > affected but only PyPI. > *Q3. How is it done today, and what are the limits of current practice?* > Currently, we run pip install pyspark, and it is over 300MB because of > dependent jars. In addition, PySpark requires you to set up other > environments such as JDK installation. > This is not suitable when the running environment and resource is limited > such as edge devices such as smart home devices. > Requiring a non-Python environment is not Python friendly. > *Q4. What is new in your approach and why do you think it will be successful?* > It provides a pure Python library, which eliminates other environment > requirements such as JDK, and reduces the resource usage by decoupling Spark > Driver, and reduces the package size. > *Q5. Who cares? If you are successful, what difference will it make?* > Users who want to leverage Spark in the limited environment, and want to > decouple running JVM with Spark Driver to run Spark as a Service. They can > simply pip install pyspark-connect that does not require other dependencies > (except Python dependencies just like other Python libraries). > *Q6. What are the risks?* > Because we do not change the existing PySpark package, I do not see any major > risk in classic PySpark itself. We will reuse the same Python source, and > therefore we should make sure no Py4J is used, and no JVM access is made. > This requirement might confuse the developers. At the very least, we should > add the dedicated CI to make sure the pure Python package works. > *Q7. How long will it take?* > I expect around one month including CI set up. In fact, the prototype is > ready so I expect this to be done sooner. > *Q8. What are the mid-term and final “exams” to check for success?* > The mid-term goal is to set up a scheduled CI job that builds the pure Python > library, and runs all the tests against them. > The final goral would be to properly test end-to-end usecase from pip > installation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47884) Switch ANSI SQL CI job to NON-ANSI SQL CI job
[ https://issues.apache.org/jira/browse/SPARK-47884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47884. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46099 [https://github.com/apache/spark/pull/46099] > Switch ANSI SQL CI job to NON-ANSI SQL CI job > - > > Key: SPARK-47884 > URL: https://issues.apache.org/jira/browse/SPARK-47884 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47807) Make pyspark.ml compatible with pyspark-connect
[ https://issues.apache.org/jira/browse/SPARK-47807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47807: - Summary: Make pyspark.ml compatible with pyspark-connect (was: Make pyspark.ml compatible witbh pyspark-connect) > Make pyspark.ml compatible with pyspark-connect > --- > > Key: SPARK-47807 > URL: https://issues.apache.org/jira/browse/SPARK-47807 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47885) Make pyspark.resource compatible with pyspark-connect
Hyukjin Kwon created SPARK-47885: Summary: Make pyspark.resource compatible with pyspark-connect Key: SPARK-47885 URL: https://issues.apache.org/jira/browse/SPARK-47885 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46375) Add documentation for Python data source API
[ https://issues.apache.org/jira/browse/SPARK-46375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46375. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46089 [https://github.com/apache/spark/pull/46089] > Add documentation for Python data source API > > > Key: SPARK-46375 > URL: https://issues.apache.org/jira/browse/SPARK-46375 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add documentation (user guide) for Python data soruce API. > > Note the documentation should clarify the required dependency: pyarrow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46375) Add documentation for Python data source API
[ https://issues.apache.org/jira/browse/SPARK-46375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46375: Assignee: Allison Wang > Add documentation for Python data source API > > > Key: SPARK-46375 > URL: https://issues.apache.org/jira/browse/SPARK-46375 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Add documentation (user guide) for Python data soruce API. > > Note the documentation should clarify the required dependency: pyarrow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47877) Speed up test_parity_listener
[ https://issues.apache.org/jira/browse/SPARK-47877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47877. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46072 [https://github.com/apache/spark/pull/46072] > Speed up test_parity_listener > - > > Key: SPARK-47877 > URL: https://issues.apache.org/jira/browse/SPARK-47877 > Project: Spark > Issue Type: New Feature > Components: Connect, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47760) Reeanble Avro function doctests
[ https://issues.apache.org/jira/browse/SPARK-47760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47760. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46055 [https://github.com/apache/spark/pull/46055] > Reeanble Avro function doctests > --- > > Key: SPARK-47760 > URL: https://issues.apache.org/jira/browse/SPARK-47760 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47763) Reeanble Protobuf function doctests
[ https://issues.apache.org/jira/browse/SPARK-47763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47763. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46055 [https://github.com/apache/spark/pull/46055] > Reeanble Protobuf function doctests > --- > > Key: SPARK-47763 > URL: https://issues.apache.org/jira/browse/SPARK-47763 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47818) Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests
[ https://issues.apache.org/jira/browse/SPARK-47818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47818: Assignee: Xi Lyu > Introduce plan cache in SparkConnectPlanner to improve performance of Analyze > requests > -- > > Key: SPARK-47818 > URL: https://issues.apache.org/jira/browse/SPARK-47818 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Xi Lyu >Assignee: Xi Lyu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > While building the DataFrame step by step, each time a new DataFrame is > generated with an empty schema, which is lazily computed on access. However, > if a user's code frequently accesses the schema of these new DataFrames using > methods such as `df.columns`, it will result in a large number of Analyze > requests to the server. Each time, the entire plan needs to be reanalyzed, > leading to poor performance, especially when constructing highly complex > plans. > Now, by introducing plan cache in SparkConnectPlanner, we aim to reduce the > overhead of repeated analysis during this process. This is achieved by saving > significant computation if the resolved logical plan of a subtree of can be > cached. > A minimal example of the problem: > {code:java} > import pyspark.sql.functions as F > df = spark.range(10) > for i in range(200): > if str(i) not in df.columns: # <-- The df.columns call causes a new Analyze > request in every iteration > df = df.withColumn(str(i), F.col("id") + i) > df.show() {code} > With this patch, the performance of the above code improved from ~110s to ~5s. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47818) Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests
[ https://issues.apache.org/jira/browse/SPARK-47818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47818. -- Resolution: Fixed Issue resolved by pull request 46012 [https://github.com/apache/spark/pull/46012] > Introduce plan cache in SparkConnectPlanner to improve performance of Analyze > requests > -- > > Key: SPARK-47818 > URL: https://issues.apache.org/jira/browse/SPARK-47818 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Xi Lyu >Assignee: Xi Lyu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > While building the DataFrame step by step, each time a new DataFrame is > generated with an empty schema, which is lazily computed on access. However, > if a user's code frequently accesses the schema of these new DataFrames using > methods such as `df.columns`, it will result in a large number of Analyze > requests to the server. Each time, the entire plan needs to be reanalyzed, > leading to poor performance, especially when constructing highly complex > plans. > Now, by introducing plan cache in SparkConnectPlanner, we aim to reduce the > overhead of repeated analysis during this process. This is achieved by saving > significant computation if the resolved logical plan of a subtree of can be > cached. > A minimal example of the problem: > {code:java} > import pyspark.sql.functions as F > df = spark.range(10) > for i in range(200): > if str(i) not in df.columns: # <-- The df.columns call causes a new Analyze > request in every iteration > df = df.withColumn(str(i), F.col("id") + i) > df.show() {code} > With this patch, the performance of the above code improved from ~110s to ~5s. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47862) Connect generated proots can't be pickled
[ https://issues.apache.org/jira/browse/SPARK-47862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47862. -- Resolution: Fixed Issue resolved by pull request 46068 [https://github.com/apache/spark/pull/46068] > Connect generated proots can't be pickled > - > > Key: SPARK-47862 > URL: https://issues.apache.org/jira/browse/SPARK-47862 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.1 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When Spark Connect generates the protobuf files, they're manually adjusted > and moved to the right folder. However, we did not fix the package for the > descriptor. This breaks serializing them to proto. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47862) Connect generated proots can't be pickled
[ https://issues.apache.org/jira/browse/SPARK-47862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47862: Assignee: Martin Grund > Connect generated proots can't be pickled > - > > Key: SPARK-47862 > URL: https://issues.apache.org/jira/browse/SPARK-47862 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.1 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When Spark Connect generates the protobuf files, they're manually adjusted > and moved to the right folder. However, we did not fix the package for the > descriptor. This breaks serializing them to proto. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47371) XML: Ignore row tags in CDATA Tokenizer
[ https://issues.apache.org/jira/browse/SPARK-47371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47371: Assignee: Yousof Hosny > XML: Ignore row tags in CDATA Tokenizer > --- > > Key: SPARK-47371 > URL: https://issues.apache.org/jira/browse/SPARK-47371 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yousof Hosny >Assignee: Yousof Hosny >Priority: Minor > Labels: pull-request-available > > The current parser does not recognize CDATA sections and thus will read row > tags that are enclosed within a CDATA section. The expected behavior is for > none of the following rows to be read, but they are all read. > {code:java} > // BUG: rowTag in CDATA section > val xmlString=""" > > > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47371) XML: Ignore row tags in CDATA Tokenizer
[ https://issues.apache.org/jira/browse/SPARK-47371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47371. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45487 [https://github.com/apache/spark/pull/45487] > XML: Ignore row tags in CDATA Tokenizer > --- > > Key: SPARK-47371 > URL: https://issues.apache.org/jira/browse/SPARK-47371 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yousof Hosny >Assignee: Yousof Hosny >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > The current parser does not recognize CDATA sections and thus will read row > tags that are enclosed within a CDATA section. The expected behavior is for > none of the following rows to be read, but they are all read. > {code:java} > // BUG: rowTag in CDATA section > val xmlString=""" > > > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47866) Deflaky PythonForeachWriterSuite
[ https://issues.apache.org/jira/browse/SPARK-47866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47866. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46070 [https://github.com/apache/spark/pull/46070] > Deflaky PythonForeachWriterSuite > > > Key: SPARK-47866 > URL: https://issues.apache.org/jira/browse/SPARK-47866 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47866) Deflaky PythonForeachWriterSuite
[ https://issues.apache.org/jira/browse/SPARK-47866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47866: Assignee: Dongjoon Hyun > Deflaky PythonForeachWriterSuite > > > Key: SPARK-47866 > URL: https://issues.apache.org/jira/browse/SPARK-47866 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47851) Document pyspark-connect package
[ https://issues.apache.org/jira/browse/SPARK-47851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47851. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46054 [https://github.com/apache/spark/pull/46054] > Document pyspark-connect package > > > Key: SPARK-47851 > URL: https://issues.apache.org/jira/browse/SPARK-47851 > Project: Spark > Issue Type: Sub-task > Components: Connect, Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47851) Document pyspark-connect package
[ https://issues.apache.org/jira/browse/SPARK-47851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47851: Assignee: Hyukjin Kwon > Document pyspark-connect package > > > Key: SPARK-47851 > URL: https://issues.apache.org/jira/browse/SPARK-47851 > Project: Spark > Issue Type: Sub-task > Components: Connect, Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47851) Document pyspark-connect package
Hyukjin Kwon created SPARK-47851: Summary: Document pyspark-connect package Key: SPARK-47851 URL: https://issues.apache.org/jira/browse/SPARK-47851 Project: Spark Issue Type: Sub-task Components: Connect, Documentation, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47757) Reeanble MemoryProfilerParityTests for pyspark-connect
[ https://issues.apache.org/jira/browse/SPARK-47757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47757: - Summary: Reeanble MemoryProfilerParityTests for pyspark-connect (was: Reeanble ResourceProfileTests for pyspark-connect) > Reeanble MemoryProfilerParityTests for pyspark-connect > -- > > Key: SPARK-47757 > URL: https://issues.apache.org/jira/browse/SPARK-47757 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47830) Reeanble ResourceProfileTests for pyspark-connect
[ https://issues.apache.org/jira/browse/SPARK-47830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47830: - Summary: Reeanble ResourceProfileTests for pyspark-connect (was: Reeanble MemoryProfilerParityTests for pyspark-connect) > Reeanble ResourceProfileTests for pyspark-connect > - > > Key: SPARK-47830 > URL: https://issues.apache.org/jira/browse/SPARK-47830 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47849) Change release script to release pyspark-connect
[ https://issues.apache.org/jira/browse/SPARK-47849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47849. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46049 [https://github.com/apache/spark/pull/46049] > Change release script to release pyspark-connect > > > Key: SPARK-47849 > URL: https://issues.apache.org/jira/browse/SPARK-47849 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47849) Change release script to release pyspark-connect
[ https://issues.apache.org/jira/browse/SPARK-47849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47849: Assignee: Hyukjin Kwon > Change release script to release pyspark-connect > > > Key: SPARK-47849 > URL: https://issues.apache.org/jira/browse/SPARK-47849 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47757) Reeanble ResourceProfileTests for pyspark-connect
[ https://issues.apache.org/jira/browse/SPARK-47757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47757. -- Fix Version/s: 4.0.0 Assignee: Hyukjin Kwon Resolution: Fixed Fixed in https://github.com/apache/spark/pull/46036 > Reeanble ResourceProfileTests for pyspark-connect > - > > Key: SPARK-47757 > URL: https://issues.apache.org/jira/browse/SPARK-47757 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47756) Reeanble UDFProfilerParityTests for pyspark-connect
[ https://issues.apache.org/jira/browse/SPARK-47756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47756. -- Fix Version/s: 4.0.0 Assignee: Hyukjin Kwon Resolution: Fixed Fixed in https://github.com/apache/spark/pull/46036 > Reeanble UDFProfilerParityTests for pyspark-connect > --- > > Key: SPARK-47756 > URL: https://issues.apache.org/jira/browse/SPARK-47756 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47849) Change release script to release pyspark-connect
Hyukjin Kwon created SPARK-47849: Summary: Change release script to release pyspark-connect Key: SPARK-47849 URL: https://issues.apache.org/jira/browse/SPARK-47849 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47812) Support Serializing Spark Sessions in ForEachBAtch
[ https://issues.apache.org/jira/browse/SPARK-47812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47812. -- Resolution: Fixed Issue resolved by pull request 46002 [https://github.com/apache/spark/pull/46002] > Support Serializing Spark Sessions in ForEachBAtch > -- > > Key: SPARK-47812 > URL: https://issues.apache.org/jira/browse/SPARK-47812 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.1 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > SparkSessions using Connect should be serialized when used in ForEachBatch > and friends. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47812) Support Serializing Spark Sessions in ForEachBAtch
[ https://issues.apache.org/jira/browse/SPARK-47812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47812: Assignee: Martin Grund > Support Serializing Spark Sessions in ForEachBAtch > -- > > Key: SPARK-47812 > URL: https://issues.apache.org/jira/browse/SPARK-47812 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.1 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > SparkSessions using Connect should be serialized when used in ForEachBatch > and friends. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47831) Run Pandas API on Spark for pyspark-connect package
[ https://issues.apache.org/jira/browse/SPARK-47831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47831. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46001 [https://github.com/apache/spark/pull/46001] > Run Pandas API on Spark for pyspark-connect package > --- > > Key: SPARK-47831 > URL: https://issues.apache.org/jira/browse/SPARK-47831 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47831) Run Pandas API on Spark for pyspark-connect package
Hyukjin Kwon created SPARK-47831: Summary: Run Pandas API on Spark for pyspark-connect package Key: SPARK-47831 URL: https://issues.apache.org/jira/browse/SPARK-47831 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark, PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47830) Reeanble MemoryProfilerParityTests for pyspark-connect
Hyukjin Kwon created SPARK-47830: Summary: Reeanble MemoryProfilerParityTests for pyspark-connect Key: SPARK-47830 URL: https://issues.apache.org/jira/browse/SPARK-47830 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47827) Missing warnings for deprecated features
[ https://issues.apache.org/jira/browse/SPARK-47827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47827. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46021 [https://github.com/apache/spark/pull/46021] > Missing warnings for deprecated features > > > Key: SPARK-47827 > URL: https://issues.apache.org/jira/browse/SPARK-47827 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > There are some APIs will be removed but missing deprecation warnings -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47827) Missing warnings for deprecated features
[ https://issues.apache.org/jira/browse/SPARK-47827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47827: Assignee: Haejoon Lee > Missing warnings for deprecated features > > > Key: SPARK-47827 > URL: https://issues.apache.org/jira/browse/SPARK-47827 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > There are some APIs will be removed but missing deprecation warnings -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47174) Client Side Listener - Server side implementation
[ https://issues.apache.org/jira/browse/SPARK-47174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47174. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45988 [https://github.com/apache/spark/pull/45988] > Client Side Listener - Server side implementation > - > > Key: SPARK-47174 > URL: https://issues.apache.org/jira/browse/SPARK-47174 > Project: Spark > Issue Type: Improvement > Components: Connect, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47174) Client Side Listener - Server side implementation
[ https://issues.apache.org/jira/browse/SPARK-47174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47174: Assignee: Wei Liu > Client Side Listener - Server side implementation > - > > Key: SPARK-47174 > URL: https://issues.apache.org/jira/browse/SPARK-47174 > Project: Spark > Issue Type: Improvement > Components: Connect, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47824) Nondeterminism in pyspark.pandas.series.asof
[ https://issues.apache.org/jira/browse/SPARK-47824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47824. -- Fix Version/s: 3.4.3 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46018 [https://github.com/apache/spark/pull/46018] > Nondeterminism in pyspark.pandas.series.asof > > > Key: SPARK-47824 > URL: https://issues.apache.org/jira/browse/SPARK-47824 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.2, 4.0.0, 3.5.1, 3.3.4 >Reporter: Mark Jarvin >Assignee: Mark Jarvin >Priority: Major > Labels: pull-request-available > Fix For: 3.4.3, 3.5.2, 4.0.0 > > > `max_by` in `pyspark.pandas.series.asof` uses a literal string instead of a > generated column as its ordering condition, resulting in nondeterminism. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47824) Nondeterminism in pyspark.pandas.series.asof
[ https://issues.apache.org/jira/browse/SPARK-47824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47824: Assignee: Mark Jarvin > Nondeterminism in pyspark.pandas.series.asof > > > Key: SPARK-47824 > URL: https://issues.apache.org/jira/browse/SPARK-47824 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.2, 4.0.0, 3.5.1, 3.3.4 >Reporter: Mark Jarvin >Assignee: Mark Jarvin >Priority: Major > Labels: pull-request-available > > `max_by` in `pyspark.pandas.series.asof` uses a literal string instead of a > generated column as its ordering condition, resulting in nondeterminism. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47826) Add VariantVal for PySpark
[ https://issues.apache.org/jira/browse/SPARK-47826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47826: Assignee: Gene Pang > Add VariantVal for PySpark > -- > > Key: SPARK-47826 > URL: https://issues.apache.org/jira/browse/SPARK-47826 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Gene Pang >Assignee: Gene Pang >Priority: Major > Fix For: 4.0.0 > > > Add a `VariantVal` implementation for PySpark. It includes convenience > methods to convert the Variant to a string, or to a Python object, so that > users can more easily work with Variant data. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47826) Add VariantVal for PySpark
[ https://issues.apache.org/jira/browse/SPARK-47826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47826. -- Resolution: Fixed Fixed in https://github.com/apache/spark/pull/45826 > Add VariantVal for PySpark > -- > > Key: SPARK-47826 > URL: https://issues.apache.org/jira/browse/SPARK-47826 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Gene Pang >Priority: Major > Fix For: 4.0.0 > > > Add a `VariantVal` implementation for PySpark. It includes convenience > methods to convert the Variant to a string, or to a Python object, so that > users can more easily work with Variant data. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47811) Run ML tests for pyspark-connect package
[ https://issues.apache.org/jira/browse/SPARK-47811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47811. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45941 [https://github.com/apache/spark/pull/45941] > Run ML tests for pyspark-connect package > > > Key: SPARK-47811 > URL: https://issues.apache.org/jira/browse/SPARK-47811 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47811) Run ML tests for pyspark-connect package
[ https://issues.apache.org/jira/browse/SPARK-47811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47811: Assignee: Hyukjin Kwon > Run ML tests for pyspark-connect package > > > Key: SPARK-47811 > URL: https://issues.apache.org/jira/browse/SPARK-47811 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47811) Run ML tests for pyspark-connect package
Hyukjin Kwon created SPARK-47811: Summary: Run ML tests for pyspark-connect package Key: SPARK-47811 URL: https://issues.apache.org/jira/browse/SPARK-47811 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47807) Make pyspark.ml compatible witbh pyspark-connect
[ https://issues.apache.org/jira/browse/SPARK-47807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47807. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45995 [https://github.com/apache/spark/pull/45995] > Make pyspark.ml compatible witbh pyspark-connect > > > Key: SPARK-47807 > URL: https://issues.apache.org/jira/browse/SPARK-47807 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47704) JSON parsing fails with "java.lang.ClassCastException: org.apache.spark.sql.catalyst.util.ArrayBasedMapData cannot be cast to org.apache.spark.sql.catalyst.util.ArrayDa
[ https://issues.apache.org/jira/browse/SPARK-47704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47704: Assignee: Ivan Sadikov > JSON parsing fails with "java.lang.ClassCastException: > org.apache.spark.sql.catalyst.util.ArrayBasedMapData cannot be cast to > org.apache.spark.sql.catalyst.util.ArrayData" when > spark.sql.json.enablePartialResults is enabled > --- > > Key: SPARK-47704 > URL: https://issues.apache.org/jira/browse/SPARK-47704 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Labels: pull-request-available > > When reading the following JSON \{"a":[{"key":{"b":0}}]}: > {code:java} > val df = spark.read.schema("a array boolean>>>").json(path){code} > Spark throws exception: > {code:java} > Cause: java.lang.ClassCastException: class > org.apache.spark.sql.catalyst.util.ArrayBasedMapData cannot be cast to class > org.apache.spark.sql.catalyst.util.ArrayData > (org.apache.spark.sql.catalyst.util.ArrayBasedMapData and > org.apache.spark.sql.catalyst.util.ArrayData are in unnamed module of loader > 'app') > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray(rows.scala:53) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray$(rows.scala:53) > at > org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getArray(rows.scala:172) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:605) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.$anonfun$prepareNextFile$1(FileScanRDD.scala:884) > at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) {code} > > The same happens for map: \{"a":{"key":[{"b":0}]}} when array and map types > are swapped. > {code:java} > val df = spark.read.schema("a map boolean>>>").json(path) {code} > > This is a corner case that https://issues.apache.org/jira/browse/SPARK-44940 > missed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47704) JSON parsing fails with "java.lang.ClassCastException: org.apache.spark.sql.catalyst.util.ArrayBasedMapData cannot be cast to org.apache.spark.sql.catalyst.util.ArrayDa
[ https://issues.apache.org/jira/browse/SPARK-47704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47704. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45833 [https://github.com/apache/spark/pull/45833] > JSON parsing fails with "java.lang.ClassCastException: > org.apache.spark.sql.catalyst.util.ArrayBasedMapData cannot be cast to > org.apache.spark.sql.catalyst.util.ArrayData" when > spark.sql.json.enablePartialResults is enabled > --- > > Key: SPARK-47704 > URL: https://issues.apache.org/jira/browse/SPARK-47704 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When reading the following JSON \{"a":[{"key":{"b":0}}]}: > {code:java} > val df = spark.read.schema("a array boolean>>>").json(path){code} > Spark throws exception: > {code:java} > Cause: java.lang.ClassCastException: class > org.apache.spark.sql.catalyst.util.ArrayBasedMapData cannot be cast to class > org.apache.spark.sql.catalyst.util.ArrayData > (org.apache.spark.sql.catalyst.util.ArrayBasedMapData and > org.apache.spark.sql.catalyst.util.ArrayData are in unnamed module of loader > 'app') > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray(rows.scala:53) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray$(rows.scala:53) > at > org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getArray(rows.scala:172) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:605) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.$anonfun$prepareNextFile$1(FileScanRDD.scala:884) > at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) {code} > > The same happens for map: \{"a":{"key":[{"b":0}]}} when array and map types > are swapped. > {code:java} > val df = spark.read.schema("a map boolean>>>").json(path) {code} > > This is a corner case that https://issues.apache.org/jira/browse/SPARK-44940 > missed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41811) Implement SparkSession.sql's string formatter
[ https://issues.apache.org/jira/browse/SPARK-41811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41811. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45614 [https://github.com/apache/spark/pull/45614] > Implement SparkSession.sql's string formatter > - > > Key: SPARK-41811 > URL: https://issues.apache.org/jira/browse/SPARK-41811 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > ** > File "/.../spark/python/pyspark/sql/connect/session.py", line 345, in > pyspark.sql.connect.session.SparkSession.sql > Failed example: > spark.sql( > "SELECT * FROM range(10) WHERE id > {bound1} AND id < {bound2}", > bound1=7, bound2=9 > ).show() > Exception raised: > Traceback (most recent call last): > File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line > 1336, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > spark.sql( > TypeError: sql() got an unexpected keyword argument 'bound1' > ** > File "/.../spark/python/pyspark/sql/connect/session.py", line 355, in > pyspark.sql.connect.session.SparkSession.sql > Failed example: > spark.sql( > "SELECT {col} FROM {mydf} WHERE id IN {x}", > col=mydf.id, mydf=mydf, x=tuple(range(4))).show() > Exception raised: > Traceback (most recent call last): > File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line > 1336, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > spark.sql( > TypeError: sql() got an unexpected keyword argument 'col' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41811) Implement SparkSession.sql's string formatter
[ https://issues.apache.org/jira/browse/SPARK-41811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41811: Assignee: Ruifeng Zheng > Implement SparkSession.sql's string formatter > - > > Key: SPARK-41811 > URL: https://issues.apache.org/jira/browse/SPARK-41811 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > > {code} > ** > File "/.../spark/python/pyspark/sql/connect/session.py", line 345, in > pyspark.sql.connect.session.SparkSession.sql > Failed example: > spark.sql( > "SELECT * FROM range(10) WHERE id > {bound1} AND id < {bound2}", > bound1=7, bound2=9 > ).show() > Exception raised: > Traceback (most recent call last): > File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line > 1336, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > spark.sql( > TypeError: sql() got an unexpected keyword argument 'bound1' > ** > File "/.../spark/python/pyspark/sql/connect/session.py", line 355, in > pyspark.sql.connect.session.SparkSession.sql > Failed example: > spark.sql( > "SELECT {col} FROM {mydf} WHERE id IN {x}", > col=mydf.id, mydf=mydf, x=tuple(range(4))).show() > Exception raised: > Traceback (most recent call last): > File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line > 1336, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > spark.sql( > TypeError: sql() got an unexpected keyword argument 'col' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47763) Reeanble Protobuf function doctests
Hyukjin Kwon created SPARK-47763: Summary: Reeanble Protobuf function doctests Key: SPARK-47763 URL: https://issues.apache.org/jira/browse/SPARK-47763 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47762) Add pyspark.sql.connect.protobuf into setup.py
[ https://issues.apache.org/jira/browse/SPARK-47762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47762. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45924 [https://github.com/apache/spark/pull/45924] > Add pyspark.sql.connect.protobuf into setup.py > -- > > Key: SPARK-47762 > URL: https://issues.apache.org/jira/browse/SPARK-47762 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0, 3.5.1 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We should add them.They are missing in pypi package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47762) Add pyspark.sql.connect.protobuf into setup.py
[ https://issues.apache.org/jira/browse/SPARK-47762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47762: - Fix Version/s: 3.5.2 > Add pyspark.sql.connect.protobuf into setup.py > -- > > Key: SPARK-47762 > URL: https://issues.apache.org/jira/browse/SPARK-47762 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0, 3.5.1 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > We should add them.They are missing in pypi package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47762) Add pyspark.sql.connect.protobuf into setup.py
Hyukjin Kwon created SPARK-47762: Summary: Add pyspark.sql.connect.protobuf into setup.py Key: SPARK-47762 URL: https://issues.apache.org/jira/browse/SPARK-47762 Project: Spark Issue Type: Bug Components: Connect, PySpark Affects Versions: 3.5.1, 4.0.0 Reporter: Hyukjin Kwon We should add them.They are missing in pypi package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47760) Reeanble Avro function doctests
Hyukjin Kwon created SPARK-47760: Summary: Reeanble Avro function doctests Key: SPARK-47760 URL: https://issues.apache.org/jira/browse/SPARK-47760 Project: Spark Issue Type: Sub-task Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47756) Reeanble UDFProfilerParityTests for pyspark-connect
Hyukjin Kwon created SPARK-47756: Summary: Reeanble UDFProfilerParityTests for pyspark-connect Key: SPARK-47756 URL: https://issues.apache.org/jira/browse/SPARK-47756 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47757) Reeanble ResourceProfileTests for pyspark-connect
Hyukjin Kwon created SPARK-47757: Summary: Reeanble ResourceProfileTests for pyspark-connect Key: SPARK-47757 URL: https://issues.apache.org/jira/browse/SPARK-47757 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47755) Pivot should fail when the number of distinct values is too large
[ https://issues.apache.org/jira/browse/SPARK-47755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47755. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45918 [https://github.com/apache/spark/pull/45918] > Pivot should fail when the number of distinct values is too large > - > > Key: SPARK-47755 > URL: https://issues.apache.org/jira/browse/SPARK-47755 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47752) Make pyspark.pandas compatible with pyspark-connect
[ https://issues.apache.org/jira/browse/SPARK-47752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47752. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45915 [https://github.com/apache/spark/pull/45915] > Make pyspark.pandas compatible with pyspark-connect > --- > > Key: SPARK-47752 > URL: https://issues.apache.org/jira/browse/SPARK-47752 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47753) Make pyspark.testing compatible with pyspark-connect
[ https://issues.apache.org/jira/browse/SPARK-47753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47753: Assignee: Hyukjin Kwon > Make pyspark.testing compatible with pyspark-connect > > > Key: SPARK-47753 > URL: https://issues.apache.org/jira/browse/SPARK-47753 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47753) Make pyspark.testing compatible with pyspark-connect
[ https://issues.apache.org/jira/browse/SPARK-47753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47753. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45916 [https://github.com/apache/spark/pull/45916] > Make pyspark.testing compatible with pyspark-connect > > > Key: SPARK-47753 > URL: https://issues.apache.org/jira/browse/SPARK-47753 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47753) Make pyspark.testing compatible with pyspark-connect
Hyukjin Kwon created SPARK-47753: Summary: Make pyspark.testing compatible with pyspark-connect Key: SPARK-47753 URL: https://issues.apache.org/jira/browse/SPARK-47753 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47735) Make pyspark.testing.connectutils compatible with pyspark-connect
Hyukjin Kwon created SPARK-47735: Summary: Make pyspark.testing.connectutils compatible with pyspark-connect Key: SPARK-47735 URL: https://issues.apache.org/jira/browse/SPARK-47735 Project: Spark Issue Type: Sub-task Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47734) Fix flaky pyspark.sql.dataframe.DataFrame.writeStream doctest by stopping streaming query
[ https://issues.apache.org/jira/browse/SPARK-47734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47734. -- Fix Version/s: 4.0.0 3.5.2 Resolution: Fixed Fixed in https://github.com/apache/spark/pull/45885 > Fix flaky pyspark.sql.dataframe.DataFrame.writeStream doctest by stopping > streaming query > - > > Key: SPARK-47734 > URL: https://issues.apache.org/jira/browse/SPARK-47734 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > https://issues.apache.org/jira/browse/SPARK-47199 didn't fix the flakiness in > the pyspark.sql.dataframe.DataFrame.writeStream doctest : the problem is not > that we are colliding on the test but, rather, that the test is starting a > background thread to write to a directory then deleting that directory from > the main test thread, something which is inherently race prone. > The fix is simple: stop the streaming query in the doctest itself, similar to > other streaming doctest examples. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47565) PySpark workers dying in daemon mode idle queue fail query
[ https://issues.apache.org/jira/browse/SPARK-47565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47565. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45635 [https://github.com/apache/spark/pull/45635] > PySpark workers dying in daemon mode idle queue fail query > -- > > Key: SPARK-47565 > URL: https://issues.apache.org/jira/browse/SPARK-47565 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.2, 3.5.1, 3.3.4 >Reporter: Sebastian Hillig >Assignee: Nikita Awasthi >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > PySpark workers may die after entering the idle queue in > `PythonWorkerFactory`. This may happen because of code that runs in the > process, or external factors. > When drawn from the warmpool, such a worker will result in an I/O exception > on the first read/write . -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47565) PySpark workers dying in daemon mode idle queue fail query
[ https://issues.apache.org/jira/browse/SPARK-47565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47565: Assignee: Nikita Awasthi > PySpark workers dying in daemon mode idle queue fail query > -- > > Key: SPARK-47565 > URL: https://issues.apache.org/jira/browse/SPARK-47565 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.2, 3.5.1, 3.3.4 >Reporter: Sebastian Hillig >Assignee: Nikita Awasthi >Priority: Major > Labels: pull-request-available > > PySpark workers may die after entering the idle queue in > `PythonWorkerFactory`. This may happen because of code that runs in the > process, or external factors. > When drawn from the warmpool, such a worker will result in an I/O exception > on the first read/write . -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47727) Make SparkConf to root level to for both SparkSession and SparkContext
Hyukjin Kwon created SPARK-47727: Summary: Make SparkConf to root level to for both SparkSession and SparkContext Key: SPARK-47727 URL: https://issues.apache.org/jira/browse/SPARK-47727 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47725) Set up the CI for pyspark-connect package
Hyukjin Kwon created SPARK-47725: Summary: Set up the CI for pyspark-connect package Key: SPARK-47725 URL: https://issues.apache.org/jira/browse/SPARK-47725 Project: Spark Issue Type: Sub-task Components: Project Infra, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47724) Add an environment variable for testing remote pure Python library
[ https://issues.apache.org/jira/browse/SPARK-47724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47724. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45868 [https://github.com/apache/spark/pull/45868] > Add an environment variable for testing remote pure Python library > -- > > Key: SPARK-47724 > URL: https://issues.apache.org/jira/browse/SPARK-47724 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47724) Add an environment variable for testing remote pure Python library
[ https://issues.apache.org/jira/browse/SPARK-47724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47724: Assignee: Hyukjin Kwon > Add an environment variable for testing remote pure Python library > -- > > Key: SPARK-47724 > URL: https://issues.apache.org/jira/browse/SPARK-47724 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47724) Add an environment variable for testing remote pure Python library
Hyukjin Kwon created SPARK-47724: Summary: Add an environment variable for testing remote pure Python library Key: SPARK-47724 URL: https://issues.apache.org/jira/browse/SPARK-47724 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47683) Separate pure Python packaging
[ https://issues.apache.org/jira/browse/SPARK-47683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47683. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45053 [https://github.com/apache/spark/pull/45053] > Separate pure Python packaging > -- > > Key: SPARK-47683 > URL: https://issues.apache.org/jira/browse/SPARK-47683 > Project: Spark > Issue Type: Sub-task > Components: Build, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Initial version. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org