[jira] [Resolved] (SPARK-43979) CollectedMetrics should be treated as the same one for self-join
[ https://issues.apache.org/jira/browse/SPARK-43979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-43979. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41475 [https://github.com/apache/spark/pull/41475] > CollectedMetrics should be treated as the same one for self-join > > > Key: SPARK-43979 > URL: https://issues.apache.org/jira/browse/SPARK-43979 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43717) Scala Client Dataset#reduce failed to handle null partitions for scala primitive types
[ https://issues.apache.org/jira/browse/SPARK-43717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-43717. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41264 [https://github.com/apache/spark/pull/41264] > Scala Client Dataset#reduce failed to handle null partitions for scala > primitive types > -- > > Key: SPARK-43717 > URL: https://issues.apache.org/jira/browse/SPARK-43717 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Major > Fix For: 3.5.0 > > > Scala client failed with NPE when running: > assert(spark.range(0, 5, 1, 10).as[Long].reduce(_ + _) == 10) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43717) Scala Client Dataset#reduce failed to handle null partitions for scala primitive types
[ https://issues.apache.org/jira/browse/SPARK-43717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-43717: Assignee: Zhen Li > Scala Client Dataset#reduce failed to handle null partitions for scala > primitive types > -- > > Key: SPARK-43717 > URL: https://issues.apache.org/jira/browse/SPARK-43717 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Major > > Scala client failed with NPE when running: > assert(spark.range(0, 5, 1, 10).as[Long].reduce(_ + _) == 10) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43989) Add maven testing GA task for connect server module
Yang Jie created SPARK-43989: Summary: Add maven testing GA task for connect server module Key: SPARK-43989 URL: https://issues.apache.org/jira/browse/SPARK-43989 Project: Spark Issue Type: Improvement Components: Connect, Project Infra Affects Versions: 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43988) Add maven testing GA task for connect client module
[ https://issues.apache.org/jira/browse/SPARK-43988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-43988: - Summary: Add maven testing GA task for connect client module (was: Add independent maven testing GA task for connect client module) > Add maven testing GA task for connect client module > --- > > Key: SPARK-43988 > URL: https://issues.apache.org/jira/browse/SPARK-43988 > Project: Spark > Issue Type: Improvement > Components: Connect, Project Infra >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43988) Add independent maven testing GA task for connect client module
Yang Jie created SPARK-43988: Summary: Add independent maven testing GA task for connect client module Key: SPARK-43988 URL: https://issues.apache.org/jira/browse/SPARK-43988 Project: Spark Issue Type: Improvement Components: Connect, Project Infra Affects Versions: 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43987) Separate finalizeShuffleMerge Processing to Dedicated Thread Pools
SHU WANG created SPARK-43987: Summary: Separate finalizeShuffleMerge Processing to Dedicated Thread Pools Key: SPARK-43987 URL: https://issues.apache.org/jira/browse/SPARK-43987 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 3.4.0, 3.2.0 Reporter: SHU WANG In our production environment, _finalizeShuffleMerge_ processing took longer time (p90 is around 20s) than other PRC requests. This is due to _finalizeShuffleMerge_ invoking IO operations like truncate and file open/close. More importantly, processing this _finalizeShuffleMerge_ can block other critical lightweight messages like authentications, which can cause authentication timeout as well as fetch failures. Those timeout and fetch failures affect the stability of the Spark job executions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43669) Fix BinaryOps.lt to work with Spark Connect Column.
[ https://issues.apache.org/jira/browse/SPARK-43669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-43669. - Fix Version/s: 3.5.0 Resolution: Fixed Fixed in [https://github.com/apache/spark/pull/41305] > Fix BinaryOps.lt to work with Spark Connect Column. > --- > > Key: SPARK-43669 > URL: https://issues.apache.org/jira/browse/SPARK-43669 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Fix BinaryOps.lt to work with Spark Connect Column. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43668) Fix BinaryOps.le to work with Spark Connect Column.
[ https://issues.apache.org/jira/browse/SPARK-43668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-43668. - Fix Version/s: 3.5.0 Resolution: Fixed Fixed in [https://github.com/apache/spark/pull/41305] > Fix BinaryOps.le to work with Spark Connect Column. > --- > > Key: SPARK-43668 > URL: https://issues.apache.org/jira/browse/SPARK-43668 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Fix BinaryOps.le to work with Spark Connect Column. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43672) Enable CategoricalOps.gt to work with Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-43672. - Fix Version/s: 3.5.0 Resolution: Fixed > Enable CategoricalOps.gt to work with Spark Connect. > > > Key: SPARK-43672 > URL: https://issues.apache.org/jira/browse/SPARK-43672 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Enable CategoricalOps.gt to work with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43673) Enable CategoricalOps.le to work with Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-43673: Fix Version/s: 3.5.0 > Enable CategoricalOps.le to work with Spark Connect. > > > Key: SPARK-43673 > URL: https://issues.apache.org/jira/browse/SPARK-43673 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Enable CategoricalOps.le to work with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43674) Enable CategoricalOps.lt to work with Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-43674: Fix Version/s: 3.5.0 > Enable CategoricalOps.lt to work with Spark Connect. > > > Key: SPARK-43674 > URL: https://issues.apache.org/jira/browse/SPARK-43674 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Enable CategoricalOps.lt to work with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-43672) Enable CategoricalOps.gt to work with Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee reopened SPARK-43672: - > Enable CategoricalOps.gt to work with Spark Connect. > > > Key: SPARK-43672 > URL: https://issues.apache.org/jira/browse/SPARK-43672 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Enable CategoricalOps.gt to work with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43667) Fix BinaryOps.gt to work with Spark Connect Column.
[ https://issues.apache.org/jira/browse/SPARK-43667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43667. -- Fix Version/s: 3.5.0 Assignee: Haejoon Lee Resolution: Fixed Fixed in https://github.com/apache/spark/pull/41305 > Fix BinaryOps.gt to work with Spark Connect Column. > --- > > Key: SPARK-43667 > URL: https://issues.apache.org/jira/browse/SPARK-43667 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Fix BinaryOps.gt to work with Spark Connect Column. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43674) Enable CategoricalOps.lt to work with Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-43674. - Resolution: Fixed This is resolved from https://github.com/apache/spark/pull/41310. > Enable CategoricalOps.lt to work with Spark Connect. > > > Key: SPARK-43674 > URL: https://issues.apache.org/jira/browse/SPARK-43674 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Enable CategoricalOps.lt to work with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43673) Enable CategoricalOps.le to work with Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-43673. - Resolution: Fixed This is resolved from https://github.com/apache/spark/pull/41310. > Enable CategoricalOps.le to work with Spark Connect. > > > Key: SPARK-43673 > URL: https://issues.apache.org/jira/browse/SPARK-43673 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Enable CategoricalOps.le to work with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43672) Enable CategoricalOps.gt to work with Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-43672. - Resolution: Fixed This is resolved from https://github.com/apache/spark/pull/41310. > Enable CategoricalOps.gt to work with Spark Connect. > > > Key: SPARK-43672 > URL: https://issues.apache.org/jira/browse/SPARK-43672 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Enable CategoricalOps.gt to work with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43985) Spark protobuf enums.as.ints raises exception on repeated enum types
[ https://issues.apache.org/jira/browse/SPARK-43985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43985. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41481 [https://github.com/apache/spark/pull/41481] > Spark protobuf enums.as.ints raises exception on repeated enum types > > > Key: SPARK-43985 > URL: https://issues.apache.org/jira/browse/SPARK-43985 > Project: Spark > Issue Type: Bug > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Parth Upadhyay >Assignee: Parth Upadhyay >Priority: Major > Fix For: 3.5.0 > > > For repeated enum types, the `enums.as.ints` being enabled currently raises > an exception when trying to deserialize repeated enum fields. We should fix > this behavior so that repeated enum fields work correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43901) Avro to Support custom decimal type backed by Long
[ https://issues.apache.org/jira/browse/SPARK-43901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-43901. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41409 [https://github.com/apache/spark/pull/41409] > Avro to Support custom decimal type backed by Long > -- > > Key: SPARK-43901 > URL: https://issues.apache.org/jira/browse/SPARK-43901 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Major > Fix For: 3.5.0 > > > Right now, Avro only allows Decimal logical type in fixed and array types. > However, there is a requirement from users to represent decimal in long type. > It is to support represent currency (for money). The request is to support a > customized decimal type backed by long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43901) Avro to Support custom decimal type backed by Long
[ https://issues.apache.org/jira/browse/SPARK-43901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-43901: - Assignee: Siying Dong > Avro to Support custom decimal type backed by Long > -- > > Key: SPARK-43901 > URL: https://issues.apache.org/jira/browse/SPARK-43901 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Major > > Right now, Avro only allows Decimal logical type in fixed and array types. > However, there is a requirement from users to represent decimal in long type. > It is to support represent currency (for money). The request is to support a > customized decimal type backed by long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42750) Support INSERT INTO by name
[ https://issues.apache.org/jira/browse/SPARK-42750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-42750. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40908 [https://github.com/apache/spark/pull/40908] > Support INSERT INTO by name > --- > > Key: SPARK-42750 > URL: https://issues.apache.org/jira/browse/SPARK-42750 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jose Torres >Assignee: Jia Fan >Priority: Major > Fix For: 3.5.0 > > > In some use cases, users have incoming dataframes with fixed column names > which might differ from the canonical order. Currently there's no way to > handle this easily through the INSERT INTO API - the user has to make sure > the columns are in the right order as they would when inserting a tuple. We > should add an optional BY NAME clause, such that: > INSERT INTO tgt BY NAME > takes each column of and inserts it into the column in `tgt` which > has the same name according to the configured `resolver` logic. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43615) Enable DataFrameSlowParityTests.test_eval
[ https://issues.apache.org/jira/browse/SPARK-43615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43615: Assignee: Haejoon Lee > Enable DataFrameSlowParityTests.test_eval > - > > Key: SPARK-43615 > URL: https://issues.apache.org/jira/browse/SPARK-43615 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Repro: > {code:java} > pdf = pd.DataFrame({"A": range(1, 6), "B": range(10, 0, -2)}) > psdf = ps.from_pandas(pdf) > pdf.eval("B = A + B // (100 + 200) * (500 - B) - 10.5") > psdf.eval("B = A + B // (100 + 200) * (500 - B) - 10.5") {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43615) Enable DataFrameSlowParityTests.test_eval
[ https://issues.apache.org/jira/browse/SPARK-43615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43615. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41471 [https://github.com/apache/spark/pull/41471] > Enable DataFrameSlowParityTests.test_eval > - > > Key: SPARK-43615 > URL: https://issues.apache.org/jira/browse/SPARK-43615 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Repro: > {code:java} > pdf = pd.DataFrame({"A": range(1, 6), "B": range(10, 0, -2)}) > psdf = ps.from_pandas(pdf) > pdf.eval("B = A + B // (100 + 200) * (500 - B) - 10.5") > psdf.eval("B = A + B // (100 + 200) * (500 - B) - 10.5") {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43930) Add unix_* functions to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43930: - Assignee: BingKun Pan > Add unix_* functions to Scala and Python > > > Key: SPARK-43930 > URL: https://issues.apache.org/jira/browse/SPARK-43930 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: BingKun Pan >Priority: Major > > Add following functions: > * unix_date > * unix_micros > * unix_millis > * unix_seconds > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43930) Add unix_* functions to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43930. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41463 [https://github.com/apache/spark/pull/41463] > Add unix_* functions to Scala and Python > > > Key: SPARK-43930 > URL: https://issues.apache.org/jira/browse/SPARK-43930 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: BingKun Pan >Priority: Major > Fix For: 3.5.0 > > > Add following functions: > * unix_date > * unix_micros > * unix_millis > * unix_seconds > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43356) Migrate deprecated createOrReplace to serverSideApply
[ https://issues.apache.org/jira/browse/SPARK-43356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-43356: - Assignee: Cheng Pan > Migrate deprecated createOrReplace to serverSideApply > - > > Key: SPARK-43356 > URL: https://issues.apache.org/jira/browse/SPARK-43356 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > > > > {{public interface CreateOrReplaceable extends Replaceable {}} > {{ /**}} > {{ * Creates a provided resource in a Kubernetes Cluster. If creation}} > {{ * fails with a HTTP_CONFLICT, it tries to replace resource.}} > {{ *}} > {{ * @return created item returned in kubernetes api response}} > {{ *}} > {{ * @deprecated please use \{@link ServerSideApplicable#serverSideApply()} > or attempt a create and edit/patch operation.}} > {{ */}} > {{ @Deprecated}} > {{ T createOrReplace();}} > > {{ /**}} > {{ * Creates an item}} > {{ *}} > {{ * @return the item from the api server}} > {{ */}} > {{ T create();}} > {{}}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43356) Migrate deprecated createOrReplace to serverSideApply
[ https://issues.apache.org/jira/browse/SPARK-43356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-43356. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41136 [https://github.com/apache/spark/pull/41136] > Migrate deprecated createOrReplace to serverSideApply > - > > Key: SPARK-43356 > URL: https://issues.apache.org/jira/browse/SPARK-43356 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Fix For: 3.5.0 > > > > > {{public interface CreateOrReplaceable extends Replaceable {}} > {{ /**}} > {{ * Creates a provided resource in a Kubernetes Cluster. If creation}} > {{ * fails with a HTTP_CONFLICT, it tries to replace resource.}} > {{ *}} > {{ * @return created item returned in kubernetes api response}} > {{ *}} > {{ * @deprecated please use \{@link ServerSideApplicable#serverSideApply()} > or attempt a create and edit/patch operation.}} > {{ */}} > {{ @Deprecated}} > {{ T createOrReplace();}} > > {{ /**}} > {{ * Creates an item}} > {{ *}} > {{ * @return the item from the api server}} > {{ */}} > {{ T create();}} > {{}}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43906) Implement the file support in SparkSession.addArtifacts
[ https://issues.apache.org/jira/browse/SPARK-43906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43906. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41415 [https://github.com/apache/spark/pull/41415] > Implement the file support in SparkSession.addArtifacts > --- > > Key: SPARK-43906 > URL: https://issues.apache.org/jira/browse/SPARK-43906 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.5.0 > > > Related to SPARK-42748, SPARK-43747 and SPARK-43612. We should also make > SparkSession.addArtifacts work with regular files. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43906) Implement the file support in SparkSession.addArtifacts
[ https://issues.apache.org/jira/browse/SPARK-43906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43906: Assignee: Hyukjin Kwon > Implement the file support in SparkSession.addArtifacts > --- > > Key: SPARK-43906 > URL: https://issues.apache.org/jira/browse/SPARK-43906 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > Related to SPARK-42748, SPARK-43747 and SPARK-43612. We should also make > SparkSession.addArtifacts work with regular files. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43970) Hide unsupported dataframe methods from auto-completion
[ https://issues.apache.org/jira/browse/SPARK-43970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43970: - Assignee: Ruifeng Zheng > Hide unsupported dataframe methods from auto-completion > --- > > Key: SPARK-43970 > URL: https://issues.apache.org/jira/browse/SPARK-43970 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43970) Hide unsupported dataframe methods from auto-completion
[ https://issues.apache.org/jira/browse/SPARK-43970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43970. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41462 [https://github.com/apache/spark/pull/41462] > Hide unsupported dataframe methods from auto-completion > --- > > Key: SPARK-43970 > URL: https://issues.apache.org/jira/browse/SPARK-43970 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43986) Add error classes for HyperLogLog functions
Daniel created SPARK-43986: -- Summary: Add error classes for HyperLogLog functions Key: SPARK-43986 URL: https://issues.apache.org/jira/browse/SPARK-43986 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Daniel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43893) StructType input/output support in Arrow-optimized Python UDF
[ https://issues.apache.org/jira/browse/SPARK-43893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-43893: Assignee: Xinrong Meng > StructType input/output support in Arrow-optimized Python UDF > - > > Key: SPARK-43893 > URL: https://issues.apache.org/jira/browse/SPARK-43893 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43893) StructType input/output support in Arrow-optimized Python UDF
[ https://issues.apache.org/jira/browse/SPARK-43893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-43893. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41321 [https://github.com/apache/spark/pull/41321] > StructType input/output support in Arrow-optimized Python UDF > - > > Key: SPARK-43893 > URL: https://issues.apache.org/jira/browse/SPARK-43893 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36277) Issue with record count of data frame while reading in DropMalformed mode
[ https://issues.apache.org/jira/browse/SPARK-36277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729825#comment-17729825 ] Zach Liu edited comment on SPARK-36277 at 6/6/23 6:55 PM: -- I see the same behavior on Spark 3.3.1. I have to create this "checkpoint": {code:java} spark.conf.set( "spark.sql.optimizer.excludedRules", "org.apache.spark.sql.catalyst.optimizer.ColumnPruning", ) true_count = df.count() spark.conf.set("spark.sql.optimizer.excludedRules", "null") all_count = df.count() malformed_count = all_count - true_count if malformed_count > 0: raise ValueError("Self-defined schema is not compatible with the data") {code} [~fchen] I don't know if disabling `ColumnPruning` has other implications, so I just re-enable it. was (Author: zach liu): I see the same behavior on Spark 3.3.1. I have to create this "checkpoint": {code:java} spark.sql("set spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ColumnPruning") true_count = df.count() spark.sql("set spark.sql.optimizer.excludedRules=null") all_count = df.count() malformed_count = all_count - true_count if malformed_count > 0: raise ValueError("Self-defined schema is not compatible with the data") {code} [~fchen] I don't know if disabling `ColumnPruning` has other implications, so I just re-enable it. > Issue with record count of data frame while reading in DropMalformed mode > - > > Key: SPARK-36277 > URL: https://issues.apache.org/jira/browse/SPARK-36277 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.3 >Reporter: anju >Priority: Major > Attachments: 111.PNG, Inputfile.PNG, sample.csv > > > I am writing the steps to reproduce the issue for "count" pyspark api while > using mode as dropmalformed. > I have a csv sample file in s3 bucket . I am reading the file using pyspark > api for csv . I am reading the csv "without schema" and "with schema using > mode 'dropmalformed' options in two different dataframes . While displaying > the "with schema using mode 'dropmalformed'" dataframe , the display looks > good ,it is not showing the malformed records .But when we apply count api on > the dataframe it gives the record count of actual file. I am expecting it > should give me valid record count . > here is the code used:- > {code} > without_schema_df=spark.read.csv("s3://noa-poc-lakeformation/data/test_files/sample.csv",header=True) > schema = StructType([ \ > StructField("firstname",StringType(),True), \ > StructField("middlename",StringType(),True), \ > StructField("lastname",StringType(),True), \ > StructField("id", StringType(), True), \ > StructField("gender", StringType(), True), \ > StructField("salary", IntegerType(), True) \ > ]) > with_schema_df = > spark.read.csv("s3://noa-poc-lakeformation/data/test_files/sample.csv",header=True,schema=schema,mode="DROPMALFORMED") > print("The dataframe with schema") > with_schema_df.show() > print("The dataframe without schema") > without_schema_df.show() > cnt_with_schema=with_schema_df.count() > print("The records count from with schema df :"+str(cnt_with_schema)) > cnt_without_schema=without_schema_df.count() > print("The records count from without schema df: "+str(cnt_without_schema)) > {code} > here is the outputs screen shot 111.PNG is the outputs of the code and > inputfile.csv is the input to the code > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36277) Issue with record count of data frame while reading in DropMalformed mode
[ https://issues.apache.org/jira/browse/SPARK-36277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729825#comment-17729825 ] Zach Liu edited comment on SPARK-36277 at 6/6/23 6:44 PM: -- I see the same behavior on Spark 3.3.1. I have to create this "checkpoint": {code:java} spark.sql("set spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ColumnPruning") true_count = df.count() spark.sql("set spark.sql.optimizer.excludedRules=null") all_count = df.count() malformed_count = all_count - true_count if malformed_count > 0: raise ValueError("Self-defined schema is not compatible with the data") {code} [~fchen] I don't know if disabling `ColumnPruning` has other implications, so I just re-enable it. was (Author: zach liu): I see the same behavior on Spark 3.3.1. I have to create this "checkpoint": {code:java} spark.sql("set spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ColumnPruning") true_count = df.count() spark.sql("set spark.sql.optimizer.excludedRules=null") all_count = df.count() malformed_count = all_count - true_count if malformed_count > 0: raise ValueError("Self-defined schema is not compatible with the data") {code} I don't know if disabling `ColumnPruning` has other implications, so I just re-enable it. > Issue with record count of data frame while reading in DropMalformed mode > - > > Key: SPARK-36277 > URL: https://issues.apache.org/jira/browse/SPARK-36277 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.3 >Reporter: anju >Priority: Major > Attachments: 111.PNG, Inputfile.PNG, sample.csv > > > I am writing the steps to reproduce the issue for "count" pyspark api while > using mode as dropmalformed. > I have a csv sample file in s3 bucket . I am reading the file using pyspark > api for csv . I am reading the csv "without schema" and "with schema using > mode 'dropmalformed' options in two different dataframes . While displaying > the "with schema using mode 'dropmalformed'" dataframe , the display looks > good ,it is not showing the malformed records .But when we apply count api on > the dataframe it gives the record count of actual file. I am expecting it > should give me valid record count . > here is the code used:- > {code} > without_schema_df=spark.read.csv("s3://noa-poc-lakeformation/data/test_files/sample.csv",header=True) > schema = StructType([ \ > StructField("firstname",StringType(),True), \ > StructField("middlename",StringType(),True), \ > StructField("lastname",StringType(),True), \ > StructField("id", StringType(), True), \ > StructField("gender", StringType(), True), \ > StructField("salary", IntegerType(), True) \ > ]) > with_schema_df = > spark.read.csv("s3://noa-poc-lakeformation/data/test_files/sample.csv",header=True,schema=schema,mode="DROPMALFORMED") > print("The dataframe with schema") > with_schema_df.show() > print("The dataframe without schema") > without_schema_df.show() > cnt_with_schema=with_schema_df.count() > print("The records count from with schema df :"+str(cnt_with_schema)) > cnt_without_schema=without_schema_df.count() > print("The records count from without schema df: "+str(cnt_without_schema)) > {code} > here is the outputs screen shot 111.PNG is the outputs of the code and > inputfile.csv is the input to the code > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36277) Issue with record count of data frame while reading in DropMalformed mode
[ https://issues.apache.org/jira/browse/SPARK-36277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729825#comment-17729825 ] Zach Liu edited comment on SPARK-36277 at 6/6/23 6:41 PM: -- I see the same behavior on Spark 3.3.1. I have to create this "checkpoint": {code:java} spark.sql("set spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ColumnPruning") true_count = df.count() spark.sql("set spark.sql.optimizer.excludedRules=null") all_count = df.count() malformed_count = all_count - true_count if malformed_count > 0: raise ValueError("Self-defined schema is not compatible with the data") {code} I don't know if disabling `ColumnPruning` has other implications, so I just re-enable it. was (Author: zach liu): I see the same behavior on Spark 3.3.1. I have to create this "checkpoint": ``` spark.sql("set spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ColumnPruning") true_count = df.count() spark.sql("set spark.sql.optimizer.excludedRules=null") all_count = df.count() malformed_count = all_count - true_count if malformed_count > 0: raise ValueError("Self-defined schema is not compatible with the data") ``` I don't know if disabling `ColumnPruning` has other implications, so I just re-enable it. > Issue with record count of data frame while reading in DropMalformed mode > - > > Key: SPARK-36277 > URL: https://issues.apache.org/jira/browse/SPARK-36277 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.3 >Reporter: anju >Priority: Major > Attachments: 111.PNG, Inputfile.PNG, sample.csv > > > I am writing the steps to reproduce the issue for "count" pyspark api while > using mode as dropmalformed. > I have a csv sample file in s3 bucket . I am reading the file using pyspark > api for csv . I am reading the csv "without schema" and "with schema using > mode 'dropmalformed' options in two different dataframes . While displaying > the "with schema using mode 'dropmalformed'" dataframe , the display looks > good ,it is not showing the malformed records .But when we apply count api on > the dataframe it gives the record count of actual file. I am expecting it > should give me valid record count . > here is the code used:- > {code} > without_schema_df=spark.read.csv("s3://noa-poc-lakeformation/data/test_files/sample.csv",header=True) > schema = StructType([ \ > StructField("firstname",StringType(),True), \ > StructField("middlename",StringType(),True), \ > StructField("lastname",StringType(),True), \ > StructField("id", StringType(), True), \ > StructField("gender", StringType(), True), \ > StructField("salary", IntegerType(), True) \ > ]) > with_schema_df = > spark.read.csv("s3://noa-poc-lakeformation/data/test_files/sample.csv",header=True,schema=schema,mode="DROPMALFORMED") > print("The dataframe with schema") > with_schema_df.show() > print("The dataframe without schema") > without_schema_df.show() > cnt_with_schema=with_schema_df.count() > print("The records count from with schema df :"+str(cnt_with_schema)) > cnt_without_schema=without_schema_df.count() > print("The records count from without schema df: "+str(cnt_without_schema)) > {code} > here is the outputs screen shot 111.PNG is the outputs of the code and > inputfile.csv is the input to the code > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36277) Issue with record count of data frame while reading in DropMalformed mode
[ https://issues.apache.org/jira/browse/SPARK-36277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729825#comment-17729825 ] Zach Liu commented on SPARK-36277: -- I see the same behavior on Spark 3.3.1. I have to create this "checkpoint": ``` spark.sql("set spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ColumnPruning") true_count = df.count() spark.sql("set spark.sql.optimizer.excludedRules=null") all_count = df.count() malformed_count = all_count - true_count if malformed_count > 0: raise ValueError("Self-defined schema is not compatible with the data") ``` I don't know if disabling `ColumnPruning` has other implications, so I just re-enable it. > Issue with record count of data frame while reading in DropMalformed mode > - > > Key: SPARK-36277 > URL: https://issues.apache.org/jira/browse/SPARK-36277 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.3 >Reporter: anju >Priority: Major > Attachments: 111.PNG, Inputfile.PNG, sample.csv > > > I am writing the steps to reproduce the issue for "count" pyspark api while > using mode as dropmalformed. > I have a csv sample file in s3 bucket . I am reading the file using pyspark > api for csv . I am reading the csv "without schema" and "with schema using > mode 'dropmalformed' options in two different dataframes . While displaying > the "with schema using mode 'dropmalformed'" dataframe , the display looks > good ,it is not showing the malformed records .But when we apply count api on > the dataframe it gives the record count of actual file. I am expecting it > should give me valid record count . > here is the code used:- > {code} > without_schema_df=spark.read.csv("s3://noa-poc-lakeformation/data/test_files/sample.csv",header=True) > schema = StructType([ \ > StructField("firstname",StringType(),True), \ > StructField("middlename",StringType(),True), \ > StructField("lastname",StringType(),True), \ > StructField("id", StringType(), True), \ > StructField("gender", StringType(), True), \ > StructField("salary", IntegerType(), True) \ > ]) > with_schema_df = > spark.read.csv("s3://noa-poc-lakeformation/data/test_files/sample.csv",header=True,schema=schema,mode="DROPMALFORMED") > print("The dataframe with schema") > with_schema_df.show() > print("The dataframe without schema") > without_schema_df.show() > cnt_with_schema=with_schema_df.count() > print("The records count from with schema df :"+str(cnt_with_schema)) > cnt_without_schema=without_schema_df.count() > print("The records count from without schema df: "+str(cnt_without_schema)) > {code} > here is the outputs screen shot 111.PNG is the outputs of the code and > inputfile.csv is the input to the code > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43959) Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract
[ https://issues.apache.org/jira/browse/SPARK-43959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-43959: - Assignee: Anton Okolnychyi > Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract > -- > > Key: SPARK-43959 > URL: https://issues.apache.org/jira/browse/SPARK-43959 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > > Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43959) Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract
[ https://issues.apache.org/jira/browse/SPARK-43959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-43959. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41449 [https://github.com/apache/spark/pull/41449] > Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract > -- > > Key: SPARK-43959 > URL: https://issues.apache.org/jira/browse/SPARK-43959 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > Fix For: 3.5.0 > > > Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43976) Handle the case where modifiedConfigs doesn't exist in event logs
[ https://issues.apache.org/jira/browse/SPARK-43976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-43976: - Assignee: Dongjoon Hyun > Handle the case where modifiedConfigs doesn't exist in event logs > - > > Key: SPARK-43976 > URL: https://issues.apache.org/jira/browse/SPARK-43976 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43976) Handle the case where modifiedConfigs doesn't exist in event logs
[ https://issues.apache.org/jira/browse/SPARK-43976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-43976. --- Fix Version/s: 3.3.3 3.5.0 3.4.1 Resolution: Fixed Issue resolved by pull request 41472 [https://github.com/apache/spark/pull/41472] > Handle the case where modifiedConfigs doesn't exist in event logs > - > > Key: SPARK-43976 > URL: https://issues.apache.org/jira/browse/SPARK-43976 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.3, 3.5.0, 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43919) Extract JSON functionality out of Row
[ https://issues.apache.org/jira/browse/SPARK-43919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-43919. --- Fix Version/s: 3.5.0 Resolution: Fixed > Extract JSON functionality out of Row > -- > > Key: SPARK-43919 > URL: https://issues.apache.org/jira/browse/SPARK-43919 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43984) Change to use foreach when map doesn't produce results
Yang Jie created SPARK-43984: Summary: Change to use foreach when map doesn't produce results Key: SPARK-43984 URL: https://issues.apache.org/jira/browse/SPARK-43984 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Yang Jie Seq(1, 2).map(println) -> Seq(1, 2).foreach(println) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43980) Add support for EXCEPT in select clause, similar to what databricks provides
[ https://issues.apache.org/jira/browse/SPARK-43980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729773#comment-17729773 ] Yuming Wang commented on SPARK-43980: - Spark SQL current supports regex column specification, similar to EXCEPT: https://github.com/apache/spark/blob/2cbfc975ba937a4eb761de7a6473b7747941f386/sql/core/src/test/resources/sql-tests/inputs/query_regex_column.sql#L19-L33 > Add support for EXCEPT in select clause, similar to what databricks provides > > > Key: SPARK-43980 > URL: https://issues.apache.org/jira/browse/SPARK-43980 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yash Kothari >Priority: Major > > I'm looking for a way to incorporate the {{select * except(col1, ...)}} > clause provided by Databricks into my workflow. I don't use Databricks and > would like to introduce this {{select except}} clause either as a > spark-package or by contributing a change to Spark. > However, I'm unsure about how to begin this process and would appreciate any > guidance from the community. > [https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select.html#examples] > > Thank you for your assistance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43977) bad case of connect-jvm-client-mima-check
[ https://issues.apache.org/jira/browse/SPARK-43977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-43977: Assignee: Yang Jie > bad case of connect-jvm-client-mima-check > - > > Key: SPARK-43977 > URL: https://issues.apache.org/jira/browse/SPARK-43977 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > run > ``` > build/sbt "protobuf/clean" > dev/connect-jvm-client-mima-check > ``` > {code:java} > Using SPARK_LOCAL_IP=localhost > Using SPARK_LOCAL_IP=localhost > Do connect-client-jvm module mima check ... > Failed to find the jar: spark-protobuf-assembly(.*).jar or > spark-protobuf(.*)3.5.0-SNAPSHOT.jar inside folder: > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/protobuf/target. > This file can be generated by similar to the following command: build/sbt > package|assembly > finish connect-client-jvm module mima check ... > connect-client-jvm module mima check passed. > {code} > The check result is wrong, there are both error messages and checks > successful > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43977) bad case of connect-jvm-client-mima-check
[ https://issues.apache.org/jira/browse/SPARK-43977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-43977. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41473 [https://github.com/apache/spark/pull/41473] > bad case of connect-jvm-client-mima-check > - > > Key: SPARK-43977 > URL: https://issues.apache.org/jira/browse/SPARK-43977 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.5.0 > > > run > ``` > build/sbt "protobuf/clean" > dev/connect-jvm-client-mima-check > ``` > {code:java} > Using SPARK_LOCAL_IP=localhost > Using SPARK_LOCAL_IP=localhost > Do connect-client-jvm module mima check ... > Failed to find the jar: spark-protobuf-assembly(.*).jar or > spark-protobuf(.*)3.5.0-SNAPSHOT.jar inside folder: > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/protobuf/target. > This file can be generated by similar to the following command: build/sbt > package|assembly > finish connect-client-jvm module mima check ... > connect-client-jvm module mima check passed. > {code} > The check result is wrong, there are both error messages and checks > successful > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43939) Add try_* functions to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729752#comment-17729752 ] BingKun Pan commented on SPARK-43939: - I work on it. > Add try_* functions to Scala and Python > --- > > Key: SPARK-43939 > URL: https://issues.apache.org/jira/browse/SPARK-43939 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > > Add following functions: > * try_add > * try_avg > * try_divide > * try_element_at > * try_multiply > * try_subtract > * try_sum > * try_to_binary > * try_to_number > * try_to_timestamp > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43097) Implement pyspark ML logistic regression estimator on top of torch distributor
[ https://issues.apache.org/jira/browse/SPARK-43097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu resolved SPARK-43097. Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41383 [https://github.com/apache/spark/pull/41383] > Implement pyspark ML logistic regression estimator on top of torch distributor > -- > > Key: SPARK-43097 > URL: https://issues.apache.org/jira/browse/SPARK-43097 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43510) Spark application hangs when YarnAllocator adds running executors after processing completed containers
[ https://issues.apache.org/jira/browse/SPARK-43510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-43510. --- Fix Version/s: 3.4.1 3.5.0 Assignee: Manu Zhang Resolution: Fixed > Spark application hangs when YarnAllocator adds running executors after > processing completed containers > --- > > Key: SPARK-43510 > URL: https://issues.apache.org/jira/browse/SPARK-43510 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 3.4.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Major > Fix For: 3.4.1, 3.5.0 > > > I see application hangs when containers are preempted immediately after > allocation as follows. > {code:java} > 23/05/14 09:11:33 INFO YarnAllocator: Launching container > container_e3812_1684033797982_57865_01_000382 on host > hdc42-mcc10-01-0910-4207-015-tess0028.stratus.rno.ebay.com for executor with > ID 277 for ResourceProfile Id 0 > 23/05/14 09:11:33 WARN YarnAllocator: Cannot find executorId for container: > container_e3812_1684033797982_57865_01_000382 > 23/05/14 09:11:33 INFO YarnAllocator: Completed container > container_e3812_1684033797982_57865_01_000382 (state: COMPLETE, exit status: > -102) > 23/05/14 09:11:33 INFO YarnAllocator: Container > container_e3812_1684033797982_57865_01_000382 was preempted.{code} > Note the warning log where YarnAllocator cannot find executorId for the > container when processing completed containers. The only plausible cause is > YarnAllocator added the running executor after processing completed > containers. The former happens in a separate thread after executor launch. > YarnAllocator believes there are still running executors, although they are > already lost due to preemption. Hence, the application hangs without any > running executors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43907) Add SQL functions into Scala, Python and R API
[ https://issues.apache.org/jira/browse/SPARK-43907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729735#comment-17729735 ] Yang Jie commented on SPARK-43907: -- [~ivoson] feel free to pick up some ones which you interested > Add SQL functions into Scala, Python and R API > -- > > Key: SPARK-43907 > URL: https://issues.apache.org/jira/browse/SPARK-43907 > Project: Spark > Issue Type: Umbrella > Components: PySpark, SparkR, SQL >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Priority: Major > > See the discussion in dev mailing list > (https://lists.apache.org/thread/0tdcfyzxzcv8w46qbgwys2rormhdgyqg). > This is an umbrella JIRA to implement all SQL functions in Scala, Python and R -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43982) Implement pipeline estimator
[ https://issues.apache.org/jira/browse/SPARK-43982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu reassigned SPARK-43982: -- Assignee: Weichen Xu > Implement pipeline estimator > > > Key: SPARK-43982 > URL: https://issues.apache.org/jira/browse/SPARK-43982 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43982) Implement pipeline estimator
Weichen Xu created SPARK-43982: -- Summary: Implement pipeline estimator Key: SPARK-43982 URL: https://issues.apache.org/jira/browse/SPARK-43982 Project: Spark Issue Type: Sub-task Components: Connect, ML, PySpark Affects Versions: 3.5.0 Reporter: Weichen Xu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43983) Implement cross validator estimator
[ https://issues.apache.org/jira/browse/SPARK-43983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu reassigned SPARK-43983: -- Assignee: Weichen Xu > Implement cross validator estimator > --- > > Key: SPARK-43983 > URL: https://issues.apache.org/jira/browse/SPARK-43983 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43983) Implement cross validator estimator
Weichen Xu created SPARK-43983: -- Summary: Implement cross validator estimator Key: SPARK-43983 URL: https://issues.apache.org/jira/browse/SPARK-43983 Project: Spark Issue Type: Sub-task Components: Connect, ML, PySpark Affects Versions: 3.5.0 Reporter: Weichen Xu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43981) Basic saving / loading implementation
[ https://issues.apache.org/jira/browse/SPARK-43981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-43981: --- Description: Support saving/loading for estimator / transformer / evaluator / model. We have some design goals: * The model format is decoupled from spark, i.e. we can run model inference without spark service. * We can save model to either local file system or cloud storage file system. > Basic saving / loading implementation > - > > Key: SPARK-43981 > URL: https://issues.apache.org/jira/browse/SPARK-43981 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > > Support saving/loading for estimator / transformer / evaluator / model. > We have some design goals: > * The model format is decoupled from spark, i.e. we can run model inference > without spark service. > * We can save model to either local file system or cloud storage file system. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43981) Basic saving / loading implementation
Weichen Xu created SPARK-43981: -- Summary: Basic saving / loading implementation Key: SPARK-43981 URL: https://issues.apache.org/jira/browse/SPARK-43981 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Weichen Xu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43981) Basic saving / loading implementation
[ https://issues.apache.org/jira/browse/SPARK-43981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu reassigned SPARK-43981: -- Assignee: Weichen Xu > Basic saving / loading implementation > - > > Key: SPARK-43981 > URL: https://issues.apache.org/jira/browse/SPARK-43981 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43981) Basic saving / loading implementation
[ https://issues.apache.org/jira/browse/SPARK-43981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-43981: --- Component/s: Connect ML > Basic saving / loading implementation > - > > Key: SPARK-43981 > URL: https://issues.apache.org/jira/browse/SPARK-43981 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43980) Add support for EXCEPT in select clause, similar to what databricks provides
Yash Kothari created SPARK-43980: Summary: Add support for EXCEPT in select clause, similar to what databricks provides Key: SPARK-43980 URL: https://issues.apache.org/jira/browse/SPARK-43980 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.4.0 Reporter: Yash Kothari I'm looking for a way to incorporate the {{select * except(col1, ...)}} clause provided by Databricks into my workflow. I don't use Databricks and would like to introduce this {{select except}} clause either as a spark-package or by contributing a change to Spark. However, I'm unsure about how to begin this process and would appreciate any guidance from the community. [https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select.html#examples] Thank you for your assistance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43914) Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437]
[ https://issues.apache.org/jira/browse/SPARK-43914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-43914: --- Summary: Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437] (was: Assign a name to the error class _LEGACY_ERROR_TEMP_2427) > Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437] > -- > > Key: SPARK-43914 > URL: https://issues.apache.org/jira/browse/SPARK-43914 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43715) Add spark DataFrame binary file format writer
[ https://issues.apache.org/jira/browse/SPARK-43715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu resolved SPARK-43715. Resolution: Won't Do > Add spark DataFrame binary file format writer > - > > Key: SPARK-43715 > URL: https://issues.apache.org/jira/browse/SPARK-43715 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > > In new distributed spark ML module (designed to support spark connect and > support local inference) > We need to save ML model to hadoop file system using custom binary file > format, the reason is: > * We often submit a spark application to spark cluster for running the > training model job, we need to save trained model to hadoop file system > before the spark application completes. > * But we want to support local model inference, that means if we save the > model by current spark DataFrame writer (e.g. parquet format), when loading > model we have to rely on the spark service. But we hope we can load model > without spark service. So we want the model being saved as the original > binary format that our ML code can handle. > We already have reader API of "binaryFile" format, we need to add a writer > API: > {*}Writer API{*}: > Supposing we have a dataframe with schema: > [file_path: String, content: binary], > we can save the dataframe to a hadoop path, each row we will save it as a > file under the hadoop path, the saved file path is \{hadoop > path}/\{file_path}, "file_path" can be a multiple part path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43913) Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432]
[ https://issues.apache.org/jira/browse/SPARK-43913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-43913. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41424 [https://github.com/apache/spark/pull/41424] > Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432] > -- > > Key: SPARK-43913 > URL: https://issues.apache.org/jira/browse/SPARK-43913 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43913) Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432]
[ https://issues.apache.org/jira/browse/SPARK-43913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-43913: Assignee: jiaan.geng > Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432] > -- > > Key: SPARK-43913 > URL: https://issues.apache.org/jira/browse/SPARK-43913 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43962) Improve error messages: CANNOT_DECODE_URL, CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE, CANNOT_PARSE_DECIMAL, CANNOT_READ_FILE_FOOTER, CANNOT_RECOGNIZE_HIVE_TYPE.
[ https://issues.apache.org/jira/browse/SPARK-43962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-43962. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41455 [https://github.com/apache/spark/pull/41455] > Improve error messages: CANNOT_DECODE_URL, > CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE, CANNOT_PARSE_DECIMAL, > CANNOT_READ_FILE_FOOTER, CANNOT_RECOGNIZE_HIVE_TYPE. > -- > > Key: SPARK-43962 > URL: https://issues.apache.org/jira/browse/SPARK-43962 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Improve error message for usability. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43962) Improve error messages: CANNOT_DECODE_URL, CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE, CANNOT_PARSE_DECIMAL, CANNOT_READ_FILE_FOOTER, CANNOT_RECOGNIZE_HIVE_TYPE.
[ https://issues.apache.org/jira/browse/SPARK-43962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-43962: Assignee: Haejoon Lee > Improve error messages: CANNOT_DECODE_URL, > CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE, CANNOT_PARSE_DECIMAL, > CANNOT_READ_FILE_FOOTER, CANNOT_RECOGNIZE_HIVE_TYPE. > -- > > Key: SPARK-43962 > URL: https://issues.apache.org/jira/browse/SPARK-43962 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Improve error message for usability. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43948) Assign names to the error class _LEGACY_ERROR_TEMP_[0050|0057|0058|0059]
[ https://issues.apache.org/jira/browse/SPARK-43948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-43948: Assignee: BingKun Pan > Assign names to the error class _LEGACY_ERROR_TEMP_[0050|0057|0058|0059] > > > Key: SPARK-43948 > URL: https://issues.apache.org/jira/browse/SPARK-43948 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > > _LEGACY_ERROR_TEMP_0050 => LOCAL_MUST_WITH_SCHEMA_FILE > _LEGACY_ERROR_TEMP_0057 => UNSUPPORTED_DEFAULT_VALUE.WITHOUT_SUGGESTION > _LEGACY_ERROR_TEMP_0058 => UNSUPPORTED_DEFAULT_VALUE.WITH_SUGGESTION > _LEGACY_ERROR_TEMP_0059 => REF_DEFAULT_VALUE_IS_NOT_ALLOWED_IN_PARTITION -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43948) Assign names to the error class _LEGACY_ERROR_TEMP_[0050|0057|0058|0059]
[ https://issues.apache.org/jira/browse/SPARK-43948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-43948. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41451 [https://github.com/apache/spark/pull/41451] > Assign names to the error class _LEGACY_ERROR_TEMP_[0050|0057|0058|0059] > > > Key: SPARK-43948 > URL: https://issues.apache.org/jira/browse/SPARK-43948 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > > _LEGACY_ERROR_TEMP_0050 => LOCAL_MUST_WITH_SCHEMA_FILE > _LEGACY_ERROR_TEMP_0057 => UNSUPPORTED_DEFAULT_VALUE.WITHOUT_SUGGESTION > _LEGACY_ERROR_TEMP_0058 => UNSUPPORTED_DEFAULT_VALUE.WITH_SUGGESTION > _LEGACY_ERROR_TEMP_0059 => REF_DEFAULT_VALUE_IS_NOT_ALLOWED_IN_PARTITION -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43378) SerializerHelper.deserializeFromChunkedBuffer leaks deserialization streams
[ https://issues.apache.org/jira/browse/SPARK-43378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emil Ejbyfeldt updated SPARK-43378: --- Summary: SerializerHelper.deserializeFromChunkedBuffer leaks deserialization streams (was: SerializerHelper.deserializeFromChunkedBuffer) > SerializerHelper.deserializeFromChunkedBuffer leaks deserialization streams > --- > > Key: SPARK-43378 > URL: https://issues.apache.org/jira/browse/SPARK-43378 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0, 3.4.1, 3.5.0 >Reporter: Emil Ejbyfeldt >Assignee: Emil Ejbyfeldt >Priority: Major > Fix For: 3.4.1, 3.5.0 > > > The method SerializerHelper.deserializeFromChunkedBuffer leaks serializations > stream. This can lead to huge performance regressions when using kryo > serializer as the spark application can become bottlenecked on the driver > creating expensive kryo objects that are then leaked as part of the > deserialization stream -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org